Skip to main content
HomeBlogSQL

DuckDB makes SQL a first-class citizen on DataCamp Workspace

In this blog post, we list out all recent improvements that make it seamless and efficient to query data with SQL, all without leaving the tool.
Apr 2023

A duck codes in DuckDBIn today's data-driven world, extracting valuable insights from raw information is essential for businesses and researchers alike. This is where DataCamp Workspace steps in: a fully managed, collaborative data notebook that simplifies data analysis and makes sharing with colleagues, partners, or clients a breeze, without worrying about configuration and setup.

Since SQL is still the fundamental building block in many data projects, we've been working hard to further enhance the SQL capabilities in DataCamp Workspace.

Query CSV Files and DataFrames

Many DataCamp users are at the beginning of their data science journey and don’t have access to a dedicated database or data warehouse. Instead, they are looking to play around with a CSV file they found online or uploaded from their local computer. We also came across this limitation ourselves when hosting Workspace-powered live trainings: we had to set up a database every time we wanted to do a SQL training, which was cumbersome.

Those times are history now! With the SQL cell inside a DataCamp Workspace notebook, you can directly query CSV files. This effectively treats your CSV file as a database that you can query like any other database. Try it out here

image2.png

Users are loving this enhancement, as the CSV support lets them skip database setup entirely. Others love the workflows that the DataFrame SQL support gives, allowing them to switch between SQL and Python, drawing from the best of both languages.

How it works

This new capability is powered by DuckDB. Dubbed “SQLite for analytics”, this embedded analytical database engine is able to query CSV files with excellent performance. When users select “DataFrames and CSVs”, Workspace does not send the query to a data warehouse as usual, but transforms the query into Python code that calls the DuckDB Python API in the user’s session and return the resulting DataFrame.

In addition to querying CSV files, DuckDB also allows us to query in-memory DataFrames that were created higher up in the notebook. This creates some pretty powerful workflows, for example:

  • Query a CSV file with a SQL cell. The query result is available as a pandas DataFrame
  • Use some pandas functions to clean up and restructure the code in a way that would be hard in SQL. Store the result as a new data frame.
  • Query the resulting data frame with SQL to calculate some aggregates, which is straightforward with SQL’s GROUP BY functionality.

Other improvements

Autocomplete for Table and Column Names

We didn’t stop with CSV and DataFrame support. Writing SQL queries can be cumbersome, especially when working with large databases containing lots of tables and columns. To make this easier, we have introduced autocompletion for table and column names in DataCamp Workspace. As you type your query, our intelligent system will automatically suggest relevant table and column names based on your input, helping you avoid typos and save time. 

Table and column names are automatically suggested when you write a SQL query.

This new feature not only increases productivity but also minimizes errors. Whether you're a seasoned SQL veteran or just starting out, autocompletion will help you write better queries, faster.

Improved Schema Browser

To facilitate navigating and exploring your connected databases, we have overhauled the schema browser in DataCamp Workspace. Now, you can quickly see all tables in a connected database, search through them, and even preview their contents with just a few clicks. 

Browse schemas, tables and columns with easy shortcuts to kickstart queries.

Our enhanced schema browser is designed to help you work more efficiently with your data, allowing you to spend less time searching for the right table or column and more time uncovering valuable insights.

Parametrize your SQL queries

When working with SQL, there are times you want to dynamically update your SQL queries based on results of previous calculations or other data in your notebook. Workspace now supports this through SQL parametrization, which allows you to insert variables into your SQL queries. 

Insert Python or R variables into your SQL query using interpolation

More Database and Data Warehouse Integrations

You should be able to seamlessly access your data, no matter where it's stored. DataCamp already allows you to query all common databases and data warehouses with a SQL cell, including PostgreSQL, MySQL, Redshift, Oracle, SQL Server, BigQuery, and Athena. We’ll be adding even more integrations to our platform, such as Snowflake and ClickHouse, so we cover all data stacks and use cases.

Making analytics work fast and enjoyable

At DataCamp, we're committed to making your data analysis experience seamless and enjoyable. With these SQL enhancements we believe that DataCamp Workspace is now even better equipped to help you unlock the full potential of your data. Whether you're a data scientist, a business analyst, or just someone who loves working with data, we're confident that DataCamp Workspace will help you streamline your workflows and elevate your analyses. Give these new features a try and experience the difference for yourself!

In the meantime, we'll continue to push the boundaries of data analysis and unlock new possibilities for insights discovery, with notebook scheduling, dark mode, native dbt support, advanced dashboarding, and much more! Is there something we should improve? Something you’re especially excited about? Let us know by sending an email to [email protected]! Thanks!

Related

Building Your Data Science Portfolio With DataCamp Workspace (Part 2): Add an Analytics Workspace

Get tips and tricks for creating an analytics-style project to add to your data science portfolio using DataCamp Workspace.
Justin Saddlemyer's photo

Justin Saddlemyer

10 min

From Beginner to Pro: Dive into the Most Popular Workspace Publications on DataCamp

Explore the most popular and influential workspace publications created by our vibrant DataCamp community.
Luigi D'Introno's photo

Luigi D'Introno

Building Your Data Science Portfolio with DataCamp Workspace (Part 3): Add Machine Learning Workspace

Learn how to leverage DataCamp Workspace to produce a machine-learning project to add to your data science portfolio. We cover how to get started, how to structure your work, and common mistakes to avoid.
Justin Saddlemyer's photo

Justin Saddlemyer

8 min

Seamlessly Teach Data in the Classroom with DataCamp Workspace

This article will show you how to use DataCamp Workspace to teach data in the classroom with Python or R in less than 5 seconds.
Filip Schouwenaars's photo

Filip Schouwenaars

8 min

How to Use SQL in pandas Using pandasql Queries

Unleash the power of SQL within pandas and learn when and how to use SQL queries in pandas using the pandasql library for seamless integration.
Elena Kosourova's photo

Elena Kosourova

8 min

How to Practice SQL Using any Dataset with Workspace

Learn how DataCamp Workspace optimizes the experience of working with Jupyter notebooks and SQL. Discover how to effortlessly write SQL queries, connect to databases, analyze CSV files, and leverage the power of AI assistance
Richie Cotton's photo

Richie Cotton

9 min

See MoreSee More