Skip to content

Intermediate SQL

SQL is the most popular language for turning raw dat in a datbase into actionable insights. Using a database of films made around the world, this course covers:

  • How to filter and compare data
  • How to use aggregate functions to summarize data
  • How to sort and group your data
  • How to present your data cleanly using tools such as rounding and aliasing.

Accompanied at every step with hands-on practice queries, this course teaches you everything you need to know how to analyse data using your own SQL code today!

Roadmap

  • Querying databases
  • Count and view specified records
  • Understand query execution and style
  • Filtering
  • Aggregate functions
  • How to sort and filter group

Selecting Data

In this first chapter, you'll learn how to queery a films database and select the data needed to answer questions about the movies and actors, You'll also understand how SQL code is executed and formatted.

Querying a database

A query is a request for data from a database

Database Schema

New Keywords

COUNT(field_name) counts values in a field

  • Counts the number of records with a value in a field

COUNT(*) counts records in a table

  • * represents all fields

DISTINCT removes duplicates to return unique values Combine COUNT() with DISTINCT to count unique values

LIMIT limits how many results we return

Spinner
DataFrameas
df
variable
#count()
SELECT COUNT(birthdate) AS count_birthdates
FROM people;

#count()_multiple_fields
SELECT COUNT(name) AS count_names, COUNT(birthdate) AS count_birthdates
FROM people;

#count(*)_all_records
SELECT COUNT (*) AS total_records
FROM people;

#combine count() with distinct
SELECT COUNT(DISTINCT birthdate) AS count_distinct_birthdates FROM people;
Spinner
DataFrameas
df1
variable
-- Count the number of records in the people table
SELECT COUNT(*) AS count_records
FROM people;

-- Count the number of birthdates in the people table
SELECT COUNT(birthdate) AS count_birthdate
FROM people;

-- Count the languages and countries represented in the films table
SELECT COUNT(language) AS count_languages, COUNT(country) AS count_countries
FROM films;
Spinner
DataFrameas
df2
variable
-- Count the distinct countries from the films table
SELECT COUNT(DISTINCT country) AS count_distinct_countries
FROM films;

Query Execution

Order of Execution

  • SQL is not processed in its written order
  • Good to know processing order for debugging and aliasing
  • Aliases are declared in the SELECT statement.

Order Example

  1. FROM
  2. SELECT
  3. LIMIT

Debugging SQL Commons Errors

  • Misspelling
  • Incorrect capitalisation
  • Incorrect or missing punctuation, especially commas

Debugging is a major skill. Learn by making mistakes.

SQL Style

SQL Formatting

  • Formatting is not required
  • But lack of formatting can cause issues.
  • Irritating other people & making it harder to read code

Could be helpful to follow style guides: Holywell's style guide: https://www.sqlstyle.guide/

Guiding principle Clear and Readable Code

Semicolon

  • Best practice
  • Easier to translate between SQL flavors as some require it
  • Indicates the end of a query

Dealing with non-standard field names If someone makes a mistake while writing field names, for example release year instead of release_year. You must put these non-standard field names in double-quotes.

Why do we format?

  • Easier collaboration
  • Clean and readable
  • Looks professional
  • Easier to understand

Filtering nunbers

WHERE filtering clause Allows us to look at the data only relevant to our business questions. WHERE with comparison operators SELECT title FROM films WHERE release_year > 1960;

< Less than or before > Greater than or after <= Less than or equal to >= Greater than or equal to = Equal to <> All the release except those from 1960

WHERE using the comparison operator, equals, can also be used with strings. You will have to use sinle quotation marks around the strings we want to filter. WHERE comes after SELECT FROM before LIMIT

Order of execution would be FROM, WHERE, SELECT, LIMIT

Spinner
DataFrameas
df3
variable
-- Select film_ids and imdb_score with an imdb_score over 7.0
SELECT film_id,
        imdb_score
FROM reviews
WHERE imdb_score > 7.0;

-- Select film_ids and facebook_likes for ten records with less than 1000 likes 
SELECT film_id,
        facebook_likes
FROM reviews
WHERE facebook_likes < 1000
LIMIT 10;

-- Count the records with at least 100,000 votes
SELECT COUNT(num_votes) AS films_over_100k_votes
FROM reviews
WHERE num_votes >= 100000

-- Count the Spanish-language films
SELECT COUNT(language) AS count_spanish
FROM films
WHERE language = 'Spanish'

Filtering with multiple criteria

three additional keywords OR, AND, BETWEEN Use OR when you need to satisfy at least one condition Must specify the field for every OR, AND condition Use AND if we need to satisfy all criteria.

If a query has multiple filtering conditions, we will need to enclose the individual clauses in parentheses to ensure the correcr execution order; otherwise, we may not get the expected results.

BETWEEN keyword is inclusive

Spinner
DataFrameas
df4
variable
-- Select the title and release_year for all German-language films released before 2000
SELECT title,
        release_year
FROM films
WHERE release_year < 2000 
    AND language = 'German';
	
-- Select all records for German-language films released after 2000 and before 2010 using only WHERE and AND.
SELECT *
FROM films
WHERE (release_year > 2000 AND release_year < 2010)
    AND language = 'German';
Spinner
DataFrameas
df5
variable
-- Find the title and year of films from the 1990 or 1999
SELECT title,
    release_year
FROM films
WHERE release_year = 1990
    OR release_year = 1999;
	
SELECT title, release_year
FROM films
WHERE (release_year = 1990 OR release_year = 1999)
-- Add a filter to see only English or Spanish-language films
AND (language = 'English' OR language = 'Spanish');

SELECT title, release_year
FROM films
WHERE (release_year = 1990 OR release_year = 1999)
	AND (language = 'English' OR language = 'Spanish')
-- Filter films with more than $2,000,000 gross
	AND (gross > 2000000);
Spinner
DataFrameas
df6
variable
Lets use BETWEEN with AND on the films database to get the title and release_year of all Spanish-language films released between 1990 and 2000 (inclusive) with budgets over $100 million.

We have broken the problem into smaller steps so that you can build the query as you go along!

-- Select the title and release_year for films released between 1990 and 2000
SELECT title,
    release_year
FROM films
WHERE (release_year BETWEEN 1990 AND 2000)

SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
	AND budget > 100000000
-- Amend the query to include Spanish or French-language films
	AND (language = 'Spanish' OR language = 'French');