In today's data-driven world, extracting valuable insights from raw information is essential for businesses and researchers alike. This is where DataLab steps in: DataCamp's fully managed, collaborative data notebook that simplifies data analysis and makes sharing with colleagues, partners, or clients a breeze, without worrying about configuration and setup.
Since SQL is still the fundamental building block in many data projects, we've been working hard to further enhance the SQL capabilities in DataLab.
Query CSV Files and DataFrames
Many DataCamp users are at the beginning of their data science journey and don’t have access to a dedicated database or data warehouse. Instead, they are looking to play around with a CSV file they found online or uploaded from their local computer. We also came across this limitation ourselves when hosting DataLab-powered live trainings: we had to set up a database every time we wanted to do a SQL training, which was cumbersome.
Those times are history now! With the SQL cell inside a DataLab notebook, you can directly query CSV files. This effectively treats your CSV file as a database that you can query like any other database. Try it out here
Users are loving this enhancement, as the CSV support lets them skip database setup entirely. Others love the workflows that the DataFrame SQL support gives, allowing them to switch between SQL and Python, drawing from the best of both languages.
How it works
This new capability is powered by DuckDB. Dubbed “SQLite for analytics”, this embedded analytical database engine is able to query CSV files with excellent performance. When users select “DataFrames and CSVs”, DataLab does not send the query to a data warehouse as usual, but transforms the query into Python code that calls the DuckDB Python API in the user’s session and return the resulting DataFrame.
In addition to querying CSV files, DuckDB also allows us to query in-memory DataFrames that were created higher up in the notebook. This creates some pretty powerful workflows, for example:
- Query a CSV file with a SQL cell. The query result is available as a pandas DataFrame
- Use some pandas functions to clean up and restructure the code in a way that would be hard in SQL. Store the result as a new data frame.
- Query the resulting data frame with SQL to calculate some aggregates, which is straightforward with SQL’s GROUP BY functionality.
Other improvements
Autocomplete for Table and Column Names
We didn’t stop with CSV and DataFrame support. Writing SQL queries can be cumbersome, especially when working with large databases containing lots of tables and columns. To make this easier, we have introduced autocompletion for table and column names in DataLab. As you type your query, our intelligent system will automatically suggest relevant table and column names based on your input, helping you avoid typos and save time.
This new feature not only increases productivity but also minimizes errors. Whether you're a seasoned SQL veteran or just starting out, autocompletion will help you write better queries, faster.
Improved Schema Browser
To facilitate navigating and exploring your connected databases, we have overhauled the schema browser in DataLab. Now, you can quickly see all tables in a connected database, search through them, and even preview their contents with just a few clicks.
Our enhanced schema browser is designed to help you work more efficiently with your data, allowing you to spend less time searching for the right table or column and more time uncovering valuable insights.
Parametrize your SQL queries
When working with SQL, there are times you want to dynamically update your SQL queries based on results of previous calculations or other data in your notebook. DataLab now supports this through SQL parametrization, which allows you to insert variables into your SQL queries.
More Database and Data Warehouse Integrations
You should be able to seamlessly access your data, no matter where it's stored. DataCamp already allows you to query all common databases and data warehouses with a SQL cell, including PostgreSQL, MySQL, Redshift, Oracle, SQL Server, BigQuery, and Athena. We’ll be adding even more integrations to our platform, such as Snowflake and ClickHouse, so we cover all data stacks and use cases.
Making analytics work fast and enjoyable
At DataCamp, we're committed to making your data analysis experience seamless and enjoyable. With these SQL enhancements we believe that DataLab is now even better equipped to help you unlock the full potential of your data. Whether you're a data scientist, a business analyst, or just someone who loves working with data, we're confident that DataLab will help you streamline your workflows and elevate your analyses. Give these new features a try and experience the difference for yourself!
In the meantime, we'll continue to push the boundaries of data analysis and unlock new possibilities for insights discovery, with notebook scheduling, dark mode, native dbt support, advanced dashboarding, and much more! Is there something we should improve? Something you’re especially excited about? Let us know by sending an email to rocketship@datacamp.com! Thanks!