Tutorials
tutorial
+3

SQL Tutorial for Beginners

Are you ready to begin your SQL journey? Here’s a step-by-step guide on how to get started.

If I told you that there was treasure buried in your backyard, you wouldn’t hesitate to use a shovel to dig it up.

If you think of data as the treasure, then SQL is the shovel that digs up the data from where it is stored.

That’s a useful metaphor, but what is SQL exactly? SQL stands for Structured Query Language, and is a ubiquitous language used to interact with structured data in databases. It is a language most data practitioners—data engineers, data analysts, business analysts, and data scientists—use in their everyday work to converse with databases in a scalable and efficient manner.

Why You Should Use SQL

SQL has proven to be indispensable to many companies, including household names like Facebook, Google, Amazon, to store their data in database systems. In fact, SQL remains the third most-used technology in StackOverflow’s 2020 developer survey. It should come as no surprise to aspiring data scientists that SQL is one of the top requirements in the job description. Conveniently, it’s one of the more simple programming languages to pick up when working with data, and greatly complements working with spreadsheets.

The Various Dialects of SQL

If you’re familiar with SQL, you may have heard of the terms PostgreSQL, MySQL, Oracle and SQL Server. These are different dialects of SQL that are largely similar, with minor differences in syntax. The differences in SQL dialects are comparable to the various dialects or accents in the English language. English speakers are able to understand one another and communicate with a slight variation in accents most of the time.

In this SQL tutorial, I will be using MySQL exclusively. Worry not, learning a new SQL dialect after you’ve mastered MySQL will be a walk in the park. I will show you how you can install SQL on your local system and provide some examples of how SQL can be used to answer business problems using a Netflix dataset.

Let’s start by installing MySQL.

SQL Installation in Mac

  1. Head over to the MySQL downloads page.
  2. For Mac users, select the operating system ‘macOS’ and then the download option with ‘DMG Archive’. On the next screen, you can simply click on ‘No thanks, just start my download’ to start the download.


  1. Once the download is completed, open the downloaded file and proceed with the installation. You should see the screen below.


  1. At the configuration page, select Use Strong Password Encryption. You will be prompted to create a password for your MySQL Server, which will be needed to access the MySQL server.
  2. Proceed with the installation.
  3. Once that is done, proceed to download MySQL Workbench 8.0.20. MySQL Workbench is a graphical user interface for working with MySQL. On this download page, select ‘8.0.20’ as the product version and ‘macOS’ as the operating system. Click on the download button and install the product.
  4. Once you’re done with the installation, open MySQLWorkbench, which should be found in your applications.

Installing MySQL in Windows

  1. Head over to the MySQL downloads page.


  1. Select the operating system Microsoft Windows and then the download option MySQL Installer MSI. On the next screen, you can simply click No thanks, just start my download to start the download.
  2. Once the download is completed, open the downloaded file and proceed with the installation.
  3. On Select Product and Features, move the following from Available Products to Products/Features to be Installed. a. MySQLServer 8.0.22 b. MySQL Workbench 8.0.22, which is a graphical user interface for working with MySQL. c. MySQL Shell 8.0.22


  1. Then proceed with the installation.
  2. On the page Authentication Method, select use Strong Password Encryption for Authentication. Then, on the next page Accounts and Roles, enter the password in MySQL Root Password. This password will be used to access your database later.


  1. Proceed with the installation with the other default settings. On the final page, make sure that you check Start MySQL Workbench after setup since we will be using MySQL Workbench for the remaining parts of the tutorial.

Congratulations! MySQL Workbench is installed.

On the home screen of MySQL Workbench, click on Local Instance 3306. You will be prompted for the password that you have created earlier. Once you have entered that password, you should be greeted with the MySQL Workbench.


We’re done installing MySQL on our workstations! For a guide to install MySQL on Python, feel free to follow this tutorial.

Creating a schema

Now that we’re at the home screen of the MySQL Workbench, we’re ready for our very first query in SQL. The very first SQL query is to create a schema called new_schema. Put simply, a schema in SQL is a collection of tables—think of schemas as folders, and tables as spreadsheet files inside those folders.

To do this, we can run the following query: Type CREATE SCHEMA new_schema ; In SQL, the ; is optionally used to represent the end of an SQL script. Then, run the query by going to Query > Execute (All or Selection). For Mac users, the shortcut is Command+Enter. For the schema to show up, right-click on the schema panel and hit ‘Refresh’.



Uploading data to MySQL

Now that we’ve created a schema, we are ready to create a table inside the schema. In this example, let’s upload a comma-separated values (CSV) file that contains data on Netflix titles. After downloading the file from this link, you can use the MySQL Workbench Table Import Wizard using the instructions below.

  1. Right-click on new_schema and select Table Data Import Wizard.


  1. On the Select File to Import, locate the netflix_titles.csv file that you have downloaded.


  1. In the next step, select Create new table. Select new_schema and name the table netflix_titles. Check the drop table if it exists.


  1. Note that in the configure import settings page, the Field Type is automatically configured for you, based on the data.
  2. From here on, click next until you see a confirmation that the data has been imported completely.


Answering Questions Using MySQL

Now that we are done with the preparation work, we are ready to say Hello World using MySQL. For this example, we’ll be using SQL to find out some interesting facts about movies and TV shows on Netflix using this dataset from Kaggle.

This dataset contains the following columns.

Column Description
Show_ID Unique ID for every Movie / TV Show
Type Whether the title is a movie or a TV Show
Title Title of the movie / TV show
Director Director of the title
Cast A list of cast members of the title
Country Country where the title was produced
Date Added Date it was added on Netflix
Release Year Actual release year of the move / show
Rating TV Rating of the movie / show (e.g. TV-PG)
Duration Total duration - in minutes for movies or number of seasons for TV shows

We can now query the entire table to check if the table has been imported correctly.

To do some, simply run the query

SELECT * FROM new_schema.netflix_titles;

This query means select all columns from the table called netflix_titles that exists in the schema called new_schema.

Here, notice that

  • SELECT is a keyword used to specify which columns are to be selected.
  • * is used to indicate that all columns are selected.
  • FROM is a keyword used to indicate the source of data.
  • . is used to indicate the relationship between the new_schema and netflix_titles. Here, new_schema.netflix_titles means that netflix_title exists in new_schema.
  • ; is optionally used to end a query script.


We can now answer some interesting questions such as:

  1. Which Netflix titles are released in the year 2018?
  2. What is the oldest title on Netflix?
  3. What is the average number of seasons of a TV show (not movies) on Netflix?
  4. How long is each director’s total screen time on Netflix?
  5. What is the distribution of the year of release of Netflix titles?

Let’s get started!

1. Which Netflix titles are released in the year 2018? To answer this question, we can run the following query.

SELECT * FROM new_schema.netflix_titles
WHERE release_year = 2018;

This query means select all columns from the table called netflix_titles that exists in the schema called new_schema. However, we only select the rows of data where the release_year is 2018.

Here, notice that WHERE is a keyword used to include only certain rows that fulfill the criteria and exclude those that do not.

Instead of the equal operator (=), we can also use other operators, such as greater than (>), less than (<), at least (>=), at most (<=), and not equals to (<>). For instance, if we want to find netflix titles that are released after 2018, we can use the

SELECT * FROM new_schema.netflix_titles
WHERE release_year > 2018;

On the other hand, to select only Netflix titles released before or in 2018, the query is...

SELECT * FROM new_schema.netflix_titles
WHERE release_year <= 2018;

To select Netflix titles released in any year but 2018, the query is...

SELECT * FROM new_schema.netflix_titles
WHERE release_year <> 2018;

What if you’d like to find Netflix titles that are released between 2018 and 2019? That’s easy—we can use the BETWEEN and AND operator as such.

SELECT * FROM new_schema.netflix_titles
WHERE release_year BETWEEN 2018 AND 2019;

What if we’d like to find out titles that are released in 2016, 2017 and 2019 only? To do that, we need a new operator called IN. Note that we use curly brackets () to enclose the items and comma (,) to separate the items after the IN keyword.

SELECT * FROM new_schema.netflix_titles
WHERE release_year IN (2016, 2017, 2019);

Now, what if we want to find the oldest title on Netflix? To do so, we’ll need to arrange the rows in descending order of release year. The following question will illustrate this.

2. What is the oldest title on Netflix? We can answer this using the following query.

SELECT title, release_year
FROM new_schema.netflix_titles
ORDER BY release_year **ASC**;

If you run this query, you should see that the film ‘Pioneers: First Women Filmmakers’ was released in 1925.

The query above translates to find all titles and the corresponding release_year and arrange the results in ascending order of the release_year.

Note that in this query, the ORDER BY keywords sorts the results. To sort the results in ascending order, we use the keyword ASC following the name of the column to be sorted by. For descending order, the keyword DES is used.

For instance, to quickly find the latest title on Netflix that is available in this dataset, we can use the following syntax.

SELECT title, release_year
FROM new_schema.netflix_titles
ORDER BY release_year **DESC**;

Now that we’ve learned how to select only certain rows and sort them in sequence, let’s move on to find the summary data from only the selected rows.

3. What is the average number of seasons of a TV show (not movies) on Netflix? Here’s the query to answer this.

SELECT AVG(duration) AS ‘average_show_duration’
FROM new_schema.netflix_titles
WHERE type = 'TV Show';

This query translates to find the average duration of all netflix_titles with the type TV Show and rename the result’s column as average_show_duration. The answer is 1.78 seasons.

Here, note that:

  • AVG() is a keyword to find the average duration of TV shows.
  • Single quotation marks (‘) are used to enclose the string TV Show
  • AS is a keyword used to rename the resulting column.

AVG() here is but one of the many aggregate functions that can be used to find summary values of the selected rows. Other examples of useful aggregate functions include:

  • SUM(col_name) to find the summation of all values in col_name
  • MIN(col_name) to find the minimum value of all values in col_name
  • MAX(col_name) to find the maximum value of all values in col_name
  • COUNT(col_name) to find the number of rows in col_name

Now that we’ve learned about aggregate functions, let’s apply them to answer the next question.

4. How long is each director’s total screen time on Netflix? Here is the query, which roughly translates to find the director and the summation of the duration of each director’s films.

SELECT
director,
SUM(duration) AS total_screen_time_length
FROM new_schema.netflix_titles
WHERE type = 'Movie'
GROUP BY director;

Here, GROUP BY are the keywords used to group rows that have the same values together. It groups all the rows with the same director and computes the summation of the duration.

Quiz time! How do we modify the query to find the director with the longest screen time? (Hint: Look at question 2 for some inspiration.) The director with the longest screen time in this dataset is Martin Scorsese, who famously directed the Wolf of Wall Street.

Aggregate functions are essential in SQL. Learn more about them in this blog post.

We’re almost done. Let’s recap our post today with a question that uses all all concepts that we have learned today.

5. What is the distribution of the year of release of Netflix titles? This question aims to understand the age of the titles on Netflix, and requires us to aggregate and sort our data. To answer this, you might use the following query.

SELECT
release_year,
COUNT(show_id) number_of_titles AS ‘number_of_titles’
FROM new_schema.netflix_titles
GROUP BY release_year
ORDER BY release_year ASC;

The above query translates to count the number of show_id (each corresponding to one title) that are released each year and return the results in ascending order of release year. Exploring this result, you’ll find that most Netflix titles are released in the years 2017 and 2018.

Summary

To recap, in this tutorial, we learned to:

  1. Install SQL on a Mac or a Windows
  2. Select rows using the syntax SELECT * FROM column
  3. Select rows that fulfill a criterion using the WHERE keyword
  4. Sort the results using the ORDER BY, ASC, and DES keywords
  5. Find summary data using the aggregation functions MIN, MAX, COUNT
  6. Find summary data for different groups using aggregation functions and the GROUP BY keyword.

We covered a lot of SQL content and examples in this tutorial. Kudos to you for completing it! If you’re a beginner who would like to learn more SQL, an excellent place to start is DataCamp’s course, Introduction to SQL. Head over to our Learn SQL page to browse other courses, tracks, projects, and tutorials to help you master SQL.