-
Welcome to Intermediate SQL! 00:00 - 00:31 Hello, and welcome to Intermediate SQL. My name is Mona Khalil. I am a Curriculum Lead with DataCamp, and I will be your instructor for this course. SQL is a powerful tool for working with relational databases. With an intermediate knowledge of SQL, you will gain the ability to access and create robust data sets from multiple tables in a relational database to answer your data science questions.
-
Topics covered: 00:31 - 00:47 In this course, you will specifically learn how to shape, transform, and manipulate data using the CASE statement, simple subqueries, correlated subqueries, and window functions.
-
Prerequisites 00:47 - 01:28 Before taking this course, you should be comfortable working with introductory SQL topics, such as selecting data from a database using arithmetic functions, GROUP BY statements, and WHERE clauses to filter data. In short, the query on top should look pretty familiar to you. You should also be familiar with joining data with a LEFT JOIN, RIGHT JOIN, INNER JOIN and OUTER join. In this course, we will use and build upon these topics to interact with our database. Alright, let's get started!
-
Selecting from the European Soccer Database 01:28 - 02:10 For this course, we will be using the European Soccer Database -- a relational database that contains data about over 25,000 matches, 300 teams, and 10,000 players in Europe between 2008 and 2016. The data is contained within 4 tables -- country, league, team, and match. Selecting from tables in this database is pretty simple. The query you see here gives you the number of matches played in each of the 11 leagues listed in the "League" table.
-
Selecting from the European Soccer Database 02:10 - 02:29 Let's say we want to compare the number of home team wins, away team wins, and ties in the 2013/2014 season. The "Match" table has two relevant columns -- home_goal, and away_goal.
-
Selecting from the European Soccer Database 02:29 - 02:47 We can potentially add filters to the WHERE clause selecting wins, loses, and ties as separate queries, but that's not very efficient if you want to compare these separate outcomes in a single data set.
-
CASE statements 02:47 - 03:57 This is where the CASE statement comes in. Case statements are SQL's version of an "IF this THEN that" statement. Case statements have three parts -- a WHEN clause, a THEN clause, and an ELSE clause. The first part -- the WHEN clause -- tests a given condition, say, x = 1. If this condition is TRUE, it returns the item you specify after your THEN clause. You can create multiple conditions by listing WHEN and THEN statements within the same CASE statement. The CASE statement is then ended with an ELSE clause that returns a specified value if all of your when statements are not true. When you have completed your statement, be sure to include the term END and give it an alias. The completed CASE statement will evaluate to one column in your SQL query.
-
CASE WHEN 03:57 - 04:17 In this example, we use a CASE statement to create a new variable that identifies matches as home team wins, away team wins, or ties. A new column is created with the appropriate text for each match given the outcome.
-
Let's practice! 04:17 - 04:38 In the next lesson, we will practice more ways of structuring CASE statements using arithmetic functions such as COUNT, SUM, and AVERAGE. For now, you will practice creating CASE statements to build categories for your data.
-
In CASE things get more complex 00:00 - 00:09 Now that you understand the basics of CASE statements, let's set up some more complex logical tests.
-
Reviewing CASE WHEN 00:09 - 00:44 Previously, we covered CASE statements with one logical test in a WHEN statement, returning outcomes based on whether that test is TRUE or FALSE. This example tests whether home or away goals were higher, and identifies them as wins for the team that had a higher score. Everything ELSE is categorized as a tie. The resulting table has one column identifying matches as one of 3 possible outcomes.
-
CASE WHEN ... AND then some 00:44 - 01:53 If you want to test multiple logical conditions in a CASE statement, you can use AND inside your WHEN clause. For example, let's see if each match was played, and won, by the team Chelsea. Let's see the CASE statement in this query. Each WHEN clause contains two logical tests -- the first tests if a hometead_id identifies Chelsea, AND then it tests if the home team scored higher than the away team. If both conditions are TRUE, the new column output returns the phrase "Chelsea home win!". The opposite set of conditions are included in a second when statement -- if the awayteam_id belongs to Chelsea, AND scored higher, then the output returns "Chelsea away win!". All other matches are categorized as a loss or tie. Here's the resulting table.
-
What ELSE is being excluded? 01:53 - 02:40 When testing logical conditions, it's important to carefully consider which rows of your data are part of your ELSE clause, and if they're categorized correctly. Here's the same CASE statement from the previous slide, but the WHERE filter has been removed. Without this filter, your ELSE clause will categorize ALL matches played by anyone, who don't meet these first two conditions, as "Loss or tie :(". Here are the results of this query. A quick look at it shows that the first few matches are all categorized as "Loss or tie", but neither the hometeam_id or awayteam_id belong to Chelsea.
-
Correctly categorize your data with CASE 02:40 - 03:20 The easiest way to correct for this is to ensure you add specific filters in the WHERE clause that exclude all teams where Chelsea did not play. Here, we specify this by using an OR statement in WHERE, which retrieves only results where the id 8455 is present in the hometeam_id or awayteam_id columns. The resulting table from earlier, with the team IDs in bold here, clearly specifies whether Chelsea was home or away team.
-
What's NULL? 03:20 - 03:44 It's also important to consider what your ELSE clause is doing. These two queries here are identical, except for the ELSE NULL statement specified in the second. They both return identical results -- a table with quite a few null results. But what if you want to exclude them?
-
What are your NULL values doing? 03:44 - 04:04 Let's say we're only interested in viewing the results of games where Chelsea won, and we don't care if they lose or tie. Just like in the previous example, simply removing the ELSE clause will still retrieve those results -- and a lot of NULL values.
-
Where to place your CASE? 04:04 - 04:20 To correct for this, you can treat the entire CASE statement as a column to filter by in your WHERE clause, just like any other column. In order to filter a query by a CASE statement,
-
Where to place your CASE? 04:20 - 04:50 you include the entire CASE statement, except its alias, in WHERE. You then specify what you want to include, or exclude. For this query, I want to keep all rows where this CASE statement IS NOT NULL. My resulting table now only includes Chelsea's home and away wins -- and I don't need to filter by their team ID anymore!
-
Let's practice! 04:50 - 04:59 Okay! Let's practice some more complex CASE statements.
In CASE of rivalry Barcelona and Real Madrid have been rival teams for more than 80 years. Matches between these two teams are given the name El Clásico (The Classic). In this exercise, you will query a list of matches played between these two rivals.
You will notice in Step 2 that when you have multiple logical conditions in a CASE statement, you may quickly end up with a large number of WHEN clauses to logically test every outcome you are interested in. It's important to make sure you don't accidentally exclude key information in your ELSE clause.
In this exercise, you will retrieve information about matches played between Barcelona (id = 8634) and Real Madrid (id = 8633). Note that the query you are provided with already identifies the Clásico matches using a filter in the WHERE clause.
Instructions 1/2 50 XP 1 2 Complete the first CASE statement, identifying Barcelona or Real Madrid as the home team using the hometeam_id column. Complete the second CASE statement in the same way, using awayteam_id.
SELECT
date,
-- Identify the home team as Barcelona or Real Madrid
CASE
WHEN hometeam_id = 8634 THEN 'FC Barcelona'
WHEN hometeam_id = 8633 THEN 'Real Madrid CF'
ELSE 'Other'
END AS home,
-- Identify the away team as Barcelona or Real Madrid
CASE
WHEN awayteam_id = 8634 THEN 'FC Barcelona'
WHEN awayteam_id = 8633 THEN 'Real Madrid CF'
ELSE 'Other'
END AS away
FROM matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
AND (awayteam_id = 8633 OR hometeam_id = 8633);
SELECT
date,
-- Identify the home team as Barcelona or Real Madrid
CASE
WHEN hometeam_id = 8634 THEN 'FC Barcelona'
WHEN hometeam_id = 8633 THEN 'Real Madrid CF'
ELSE 'Real Madrid CF'
END AS home,
-- Identify the away team as Barcelona or Real Madrid
CASE
WHEN awayteam_id = 8634 THEN 'FC Barcelona'
WHEN awayteam_id = 8633 THEN 'Real Madrid CF'
ELSE 'Barcelona'
END AS away
FROM matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
AND (awayteam_id = 8633 OR hometeam_id = 8633);
In CASE of rivalry Barcelona and Real Madrid have been rival teams for more than 80 years. Matches between these two teams are given the name El Clásico (The Classic). In this exercise, you will query a list of matches played between these two rivals.
You will notice in Step 2 that when you have multiple logical conditions in a CASE statement, you may quickly end up with a large number of WHEN clauses to logically test every outcome you are interested in. It's important to make sure you don't accidentally exclude key information in your ELSE clause.
In this exercise, you will retrieve information about matches played between Barcelona (id = 8634) and Real Madrid (id = 8633). Note that the query you are provided with already identifies the Clásico matches using a filter in the WHERE clause.
Instructions 2/2 50 XP 2 Construct the final CASE statement identifying who won each match. Note there are 3 possible outcomes, but 5 conditions that you need to identify. Fill in the logical operators to identify Barcelona or Real Madrid as the winner.
SELECT
date,
CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as home,
CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as away,
-- Identify all possible match outcomes
CASE
WHEN home_goal > away_goal AND hometeam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal > away_goal AND hometeam_id = 8633 THEN 'Real Madrid win!'
WHEN home_goal < away_goal AND awayteam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal < away_goal AND awayteam_id = 8633 THEN 'Real Madrid win!'
ELSE 'Tie!'
END as outcome
FROM matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
AND (awayteam_id = 8633 OR hometeam_id = 8633);
SELECT
date,
CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as home,
CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as away,
-- Identify all possible match outcomes
CASE WHEN home_goal > away_goal AND hometeam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal > away_goal AND hometeam_id = 8633 THEN 'Real Madrid win!'
WHEN home_goal < away_goal AND awayteam_id = 8634 THEN 'Barcelona win!'
WHEN home_goal < away_goal AND awayteam_id = 8633 THEN 'Real Madrid win!'
ELSE 'Tie!'
END as outcome
FROM matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
AND (awayteam_id = 8633 OR hometeam_id = 8633);
Exercise Filtering your CASE statement Let's generate a list of matches won by Italy's Bologna team! There are quite a few additional teams in the two tables, so a key part of generating a usable query will be using your CASE statement as a filter in the WHERE clause.
CASE statements allow you to categorize data that you're interested in -- and exclude data you're not interested in. In order to do this, you can use a CASE statement as a filter in the WHERE statement to remove output you don't want to see.
Here is how you might set that up:
SELECT * FROM table WHERE CASE WHEN a > 5 THEN 'Keep' WHEN a <= 5 THEN 'Exclude' END = 'Keep'; In essence, you can use the CASE statement as a filtering column like any other column in your database. The only difference is that you don't alias the statement in WHERE. Identify Bologna's team ID listed in the teams_italy table by selecting the team_long_name and team_api_id.
-- Select team_long_name and team_api_id from team
SELECT
team_long_name,
team_api_id
FROM team
-- Filter for team long name
WHERE team_long_name = 'Bologna';
Select the season and date that a match was played. Complete the CASE statement so that only Bologna's home and away wins are identified.
-- Select the season and date columns
SELECT
season,
date,
-- Identify when Bologna won a match
CASE WHEN hometeam_id = 9857
AND home_goal > away_goal
THEN 'Bologna Win'
WHEN awayteam_id = 9857
AND away_goal > home_goal
THEN 'Bologna Win'
END AS outcome
FROM matches_italy;
Select the home_goal and away_goal for each match. Use the CASE statement in the WHERE clause to filter all NULL values generated by the statement in the previous step.
-- Select the season, date, home_goal, and away_goal columns
SELECT
season,
date,
home_goal,
away_goal
FROM matches_italy
WHERE
-- Exclude games not won by Bologna
(hometeam_id = 9857 AND home_goal > away_goal) OR
(awayteam_id = 9857 AND away_goal > home_goal);
Transcript
-
CASE WHEN with aggregate functions 00:00 - 00:08 Great job so far! Let's take a look at CASE statements with aggregate functions.
-
In CASE you need to aggregate 00:08 - 00:24 CASE statements can be used to create columns for categorizing data, and to filter your data in the WHERE clause. You can also use CASE statements to aggregate data based on the result of a logical test.
-
COUNTing CASES 00:24 - 00:48 Let's say you wanted to prepare a summary table counting the number of home and away games that Liverpool won in each season. If you've created summary tables in Spreadsheets, you can probably visualize the final table, here -- but how do you get a count of Liverpool's wins in each season?
-
CASE WHEN with COUNT 00:48 - 01:38 You guessed it -- a CASE statement. CASE statements are like any other column in your query, so you can include them inside an aggregate function. Take a look at the CASE statement. The WHEN clause includes a similar logical test to the previous lesson -- did Liverpool play as the home team, AND did the home team score higher than the away team? The difference begins in your THEN clause. Instead of returning a string of text, you return the column identifying the unique match id. When this CASE statement is inside the COUNT function, it COUNTS every id returned by this CASE statement.
-
CASE WHEN with COUNT 01:38 - 01:53 You then add a second CASE statement for the away team, and group the query by the season. When counting information in a CASE statement, you can return anything you'd like --
-
CASE WHEN with COUNT 01:53 - 02:05 a number, a string of text, or any column in the table, SQL is COUNTing the number of rows returned by the CASE statement.
-
CASE WHEN with SUM 02:05 - 02:41 Similarly, you can use the SUM function to calculate a total of any value. Let's say we're interested in the number of home and away goals that Liverpool scored in each season. This is fairly simple to set up -- if the hometeam_id is Liverpool's, return the home_goal value. The ELSE condition is assumed to be NULL, so the query returns the total home_goals scored by Liverpool in each season.
-
The CASE is fairly AVG... 02:41 - 03:11 You can also use the AVG function with CASE in two key ways. First, you can calculate an average of data. You can do this using CASE in the EXACT same way you used the SUM function. Just change out SUM for AVG in this query, and you instead get the AVG goals Liverpool scored in each season.
-
A ROUNDed AVG 03:11 - 03:23 You can make the results easier to read using ROUND. ROUND takes 2 arguments -- a numerical value, and the number of decimal points to round the value to.
-
A ROUNDed AVG 03:23 - 03:35 Place it outside your aggregate CASE statement, and include the number of decimal points at the end. There, that's much easier to read!
-
Percentages with CASE and AVG 03:35 - 04:36 The second key application of CASE with AVG is in the calculation of percentages. This requires a specific structure in order for your calculation to be accurate. The question we're answering here is, "What percentage of Liverpool's games did they win in each season?" The first component of this CASE statement is a WHEN clause identifying what you're calculating a percentage of -- in this case, how many games did they win? This is tested in the same way as previous slides, and your THEN clause returns a 1. The second component identifies Liverpool's games that they LOST, and returns the value 0. All other matches -- ties, games not involving Liverpool -- are excluded as NULLs. Here are the results of this query ...
-
Percentages with CASE and AVG 04:36 - 04:41 ... and here's the ROUNDed, more readable version of the results.
-
Let's practice! 04:41 - 04:51 Now it's your turn to practice creating CASE statements with aggregate functions.