Skip to main content
HomeBlogSQL

DuckDB makes SQL a first-class citizen on DataLab

In this blog post, we list out all recent improvements that make it seamless and efficient to query data with SQL, all without leaving the tool.
Apr 11, 2023

A duck codes in DuckDBIn today's data-driven world, extracting valuable insights from raw information is essential for businesses and researchers alike. This is where DataLab steps in: DataCamp's fully managed, collaborative data notebook that simplifies data analysis and makes sharing with colleagues, partners, or clients a breeze, without worrying about configuration and setup.

Since SQL is still the fundamental building block in many data projects, we've been working hard to further enhance the SQL capabilities in DataLab.

Query CSV Files and DataFrames

Many DataCamp users are at the beginning of their data science journey and don’t have access to a dedicated database or data warehouse. Instead, they are looking to play around with a CSV file they found online or uploaded from their local computer. We also came across this limitation ourselves when hosting DataLab-powered live trainings: we had to set up a database every time we wanted to do a SQL training, which was cumbersome.

Those times are history now! With the SQL cell inside a DataLab notebook, you can directly query CSV files. This effectively treats your CSV file as a database that you can query like any other database. Try it out here

image2.png

Users are loving this enhancement, as the CSV support lets them skip database setup entirely. Others love the workflows that the DataFrame SQL support gives, allowing them to switch between SQL and Python, drawing from the best of both languages.

How it works

This new capability is powered by DuckDB. Dubbed “SQLite for analytics”, this embedded analytical database engine is able to query CSV files with excellent performance. When users select “DataFrames and CSVs”, DataLab does not send the query to a data warehouse as usual, but transforms the query into Python code that calls the DuckDB Python API in the user’s session and return the resulting DataFrame.

In addition to querying CSV files, DuckDB also allows us to query in-memory DataFrames that were created higher up in the notebook. This creates some pretty powerful workflows, for example:

  • Query a CSV file with a SQL cell. The query result is available as a pandas DataFrame
  • Use some pandas functions to clean up and restructure the code in a way that would be hard in SQL. Store the result as a new data frame.
  • Query the resulting data frame with SQL to calculate some aggregates, which is straightforward with SQL’s GROUP BY functionality.

Other improvements

Autocomplete for Table and Column Names

We didn’t stop with CSV and DataFrame support. Writing SQL queries can be cumbersome, especially when working with large databases containing lots of tables and columns. To make this easier, we have introduced autocompletion for table and column names in DataLab. As you type your query, our intelligent system will automatically suggest relevant table and column names based on your input, helping you avoid typos and save time. 

Table and column names are automatically suggested when you write a SQL query.

This new feature not only increases productivity but also minimizes errors. Whether you're a seasoned SQL veteran or just starting out, autocompletion will help you write better queries, faster.

Improved Schema Browser

To facilitate navigating and exploring your connected databases, we have overhauled the schema browser in DataLab. Now, you can quickly see all tables in a connected database, search through them, and even preview their contents with just a few clicks. 

Browse schemas, tables and columns with easy shortcuts to kickstart queries.

Our enhanced schema browser is designed to help you work more efficiently with your data, allowing you to spend less time searching for the right table or column and more time uncovering valuable insights.

Parametrize your SQL queries

When working with SQL, there are times you want to dynamically update your SQL queries based on results of previous calculations or other data in your notebook. DataLab now supports this through SQL parametrization, which allows you to insert variables into your SQL queries. 

Insert Python or R variables into your SQL query using interpolation

More Database and Data Warehouse Integrations

You should be able to seamlessly access your data, no matter where it's stored. DataCamp already allows you to query all common databases and data warehouses with a SQL cell, including PostgreSQL, MySQL, Redshift, Oracle, SQL Server, BigQuery, and Athena. We’ll be adding even more integrations to our platform, such as Snowflake and ClickHouse, so we cover all data stacks and use cases.

Making analytics work fast and enjoyable

At DataCamp, we're committed to making your data analysis experience seamless and enjoyable. With these SQL enhancements we believe that DataLab is now even better equipped to help you unlock the full potential of your data. Whether you're a data scientist, a business analyst, or just someone who loves working with data, we're confident that DataLab will help you streamline your workflows and elevate your analyses. Give these new features a try and experience the difference for yourself!

In the meantime, we'll continue to push the boundaries of data analysis and unlock new possibilities for insights discovery, with notebook scheduling, dark mode, native dbt support, advanced dashboarding, and much more! Is there something we should improve? Something you’re especially excited about? Let us know by sending an email to rocketship@datacamp.com! Thanks!

Topics
Related

blog

An Introduction to DuckDB: What is It and Why Should You Use It?

Explore DuckDB, the fast, easy-to-use analytics database for Python and R. Learn its key features, use cases, and how it optimizes data analysis tasks.
Kurtis Pykes 's photo

Kurtis Pykes

7 min

tutorial

SQL Interface within JupyterLab

Learn how to use and modify SQL tables within JupyterLabs.
Parul Pandey's photo

Parul Pandey

7 min

tutorial

How to Practice SQL Using any Dataset with Workspace

Learn how DataCamp Workspace optimizes the experience of working with Jupyter notebooks and SQL. Discover how to effortlessly write SQL queries, connect to databases, analyze CSV files, and leverage the power of AI assistance
Richie Cotton's photo

Richie Cotton

9 min

code-along

Using DataLab in Data Academies

Learn how to use DataCamp Workspace as part of a corporate training program
Filip Schouwenaars's photo

Filip Schouwenaars

code-along

Exploratory Data Analysis in SQL for Absolute Beginners

Learn the basics of exploring new datasets using SQL.
Jasmin Ludolf's photo

Jasmin Ludolf

code-along

SQL for Absolute Beginners

Start from the very basics of what SQL is and why it's essential, move through key components such as retrieving data from databases, manipulation of data, and basic SQL queries.
Adel Nehme's photo

Adel Nehme

See MoreSee More