Skip to main content
HomeTutorialsSQL

How to Practice SQL Using any Dataset with Workspace

Learn how DataCamp Workspace optimizes the experience of working with Jupyter notebooks and SQL. Discover how to effortlessly write SQL queries, connect to databases, analyze CSV files, and leverage the power of AI assistance
May 2023  · 9 min read

DataCamp Workspace builds on Jupyter notebooks, optimizing the experience to make you more productive.

One common pain point data analysts and data scientists have is that it can be tricky to get a Jupyter notebook hooked up to a database to start working. The problem can be even more acute for beginners: how can you learn to write SQL if you can't even connect to a database?

Workspace makes it effortless to write SQL - whether you are practicing your skills or working. This tutorial shows you how to get writing SQL in Workspace in seconds.

This tutorial assumes that you've used SQL before. If you haven't, read our SQL Tutorial for Beginners first, then try the examples from that tutorial in Workspace using the techniques shown here!

Getting Started

To use Workspace, you'll need to sign up to DataCamp. (A free account is fine to get started.)

Once you have your account, use the top navigation bar to go to Workspace.

DataCamp Workspace menu

This tutorial will use the built-in NBA Shooting Data dataset. If you prefer to use your own dataset, skip to the section below.

Use the left-hand navigation to go to the Datasets page.

DataCamp Workspace menu

Type "nba" into the search box and click on the dataset.

NBA dataset search

You'll see a preview of the dataset. Read through it, then click "Use Dataset". 

NBA Datase overview

Uploading your own dataset

You can use SQL to analyze any table-like data in Workspace. Currently, CSV files are best supported, so for learning purposes, if you supply your own dataset, it is recommended to use a CSV file.

From the Overview page, click "Empty" to create an empty workspace.

Create a new workspace

Give your workspace a name, then click "Create Workspace."

Naming your workspace

In the left-hand toolbar, open the Files tool.

Files tool

In the top-left of the tool, drag a data file into the file pane to upload it. (You can also click "browse files" or click the ⊕ button then Upload).

Using SQL With Data In CSV Files

The NBA shooting data is in the nba_players_shooting.csv file. In the left-hand toolbar, open the Files tool to see the file.

Files tool NBA dataset

The notebook in the workspace contains some Python code to import the dataset. However, in this case, we want to use SQL.

At the end of the notebook, click 'Add SQL.'

Add SQL

In the "Select source" dropdown, select "DataFrames and CSVs."

Select data source

To write a SQL query against a CSV file, only one change to standard SQL is required. In the FROM clause, instead of naming the table, write the path to the CSV file in single quotes.

Let's calculate the proportion of shots made for each shooter, ordered from best to worst. In the SQL cell, type the following code.

SELECT 
	shooter,
	SUM((SCORE = 'MADE')::DOUBLE) / COUNT(*) AS prop_score_made
	FROM 'nba_players_shooting.csv'
	GROUP BY shooter
	ORDER BY prop_score_made DESC

Shots made query output

Note that the dialect of SQL used when you are working with a CSV file is DuckDB. Read more about this in DuckDB makes SQL a first-class citizen on DataCamp Workspace.

Continue working in Python, R, or no-code charts

In the last example, we kept the default settings for the query return format. Look closely at the top of the SQL cell, and you'll see that it shows the return value as a DataFrame named df. That means you can continue to work on the results in Python (or in R, for R workspaces) or draw no-code plots.

Add chart

Click 'Add Chart' to add a no-code chart.

Add chart

Set the following options.

  • Type: Bar
  • X-axis: prop_score_made
  • Y-axis: SHOOTER

Chart customization

Using SQL With Data In Data Frames

The SQL query that was run against the CSV file created a data frame named df. You can also write SQL queries against these pandas or R data frames. The only difference from standard SQL is that in the FROM clause, you provide the data frame name instead of the table name.

Here, we'll reuse the previous results in df and split the shooter column into separate first and last names. Add a SQL cell, select "DataFrames and CSVs" as the source, and use the following code.

SELECT 
str_split(shooter, ' ')[1] AS first_name,
str_split(shooter, ' ')[2] AS last_name,
prop_score_made
FROM df

Using Sample Databases

Workspace also provides several sample databases that you can use to practice your SQL skills.

Add a SQL cell, and in the source dropdown, scroll down to the "Sample Integrations" section. Here, we'll use "Bicycle Sales."

Sample integrations

A default query will appear. For the tutorial, let's replace it with a simpler query.

SELECT  
product_name, 
MIN(list_price) AS cheapest_list_price
FROM products
GROUP BY product_name

When you use a connection to a database (rather than writing SQL against a CSV file or dataframe), you get the option to return the results as a query. In the dropdown, select "Query." You can also name the resulting value. Here, change "query" to "best_price."

Rename query

Running the query gives the following output.

Query output

You can now use this result to write further SQL against. Essentially, you've written a common table expression (covered in the SQL Commands for Data Scientists tutorial) without having to use the common table expression syntax.

Add another SQL cell, and write this further query.

SELECT *
	FROM best_price
	WHERE cheapest_list_price > 10000

Running this query returns the following results.

Query output 2

Connecting to Other Databases

Workspace also allows you to connect to many types of database, including PostgreSQL, MySQL, Redshift, bigQuery, Athena, SQL Server, MariaDB, and Oracle Database. Full details of how to connect are provided in the Workspace Docs.

In the left-hand toolbar, click on "Integrations," then click the ⊕ button to create a new integration.

Choose the type of database that you are connecting to.

Connecting to external database

You'll be asked to provide connection details (these vary by the type of database).

Database connection console

Using the AI Assistant

While writing SQL can be fun and useful, getting an AI to write it for you can help you learn and make you more productive. You can read about the AI Assistant in From Data to Insights: Get There Faster with the DataCamp Workspace AI Assistant.

Add a SQL cell. In the right-hand context menu, click "Generate."

Generate function

A textbox appears at the top of the SQL cell. In this, write a task for the AI. Choose your own task, or try the following.

Return the product names and list prices for all mountain bikes made by Trek that are in stock.

After typing your query, press Enter to get the AI to generate SQL code.

AI suggested query

If you think that the code is reasonable, click "Accept & run" (or click "Accept," then "Run" later on) to see the results of the query.

Query output

Get the AI to Explain Your Errors

You can also use the AI built-in to Workspace to explain why errors occur in your SQL. Suppose you got confused while writing the query for the expensive bikes and used a Fortran-style .GT. instead of `>`.

SELECT *
    FROM best_price
    WHERE cheapest_list_price .GT. 10000

This results in an error. In the bottom-left of the SQL cell, click "Fix Error."

Error in query

The AI provides the correct code, followed by an explanation of what you did wrong.

Suggested query

Summary

You've seen how to avoid setting up a database by writing SQL queries against CSV files and data frames. You used sample databases to practice writing code against. And you saw how to use AI to generate SQL with natural language and fix your errors.

These features make it much easier to get started writing SQL and to be more productive while learning and working.

Take it to the Next Level

Get started with SQL in Workspace by logging in or signing up for a DataCamp account. You can also master some of the SQL fundamentals in our skill track, which will give you all the skills you need to interact with and query your data.


Photo of Richie Cotton
Author
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Topics

Expand Your Knowledge

Course

Joining Data in SQL

4 hr
138.4K
Level up your SQL knowledge and learn to join tables together, apply relational set theory, and work with subqueries.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

How To Use DataLab AI-Powered Notebooks for Every Data Skill Level

Find out how DataLab and its AI Assistant can boost your data science workflow - regardless of your skill level.
Alena Guzharina's photo

Alena Guzharina

6 min

tutorial

SQL Interface within JupyterLab

Learn how to use and modify SQL tables within JupyterLabs.
Parul Pandey's photo

Parul Pandey

7 min

tutorial

SQL Commands for Data Scientists

Learn the basic and essential commands for SQL with examples and a workspace notebook to refer to.

Emiko Sano

12 min

code-along

Using DataLab in Data Academies

Learn how to use DataCamp Workspace as part of a corporate training program
Filip Schouwenaars's photo

Filip Schouwenaars

code-along

Exploratory Data Analysis in SQL for Absolute Beginners

Learn the basics of exploring new datasets using SQL.
Jasmin Ludolf's photo

Jasmin Ludolf

code-along

Exploring World Cup Data in Python

Learn to use a combination of SQL & Python to import, process, and analyze data
Filip Schouwenaars's photo

Filip Schouwenaars

See MoreSee More