Skip to content

Analyzing Streaming Service Content in SQL

Which family friendly

Current Type: Bar
Current X-axis: service
Current Y-axis: pct_family
Current Color: None

Percentage of content that is family-oriented by streaming service

Exploring our data

Let's start by checking out the data we will be working with. We can start with the amazon, hulu, netflix, and disney tables.

Spinner
DataFrameas
df
variable
Run cancelled
SELECT *
FROM amazon
Spinner
DataFrameas
df
variable
Run cancelled
SELECT *
FROM hulu
Spinner
DataFrameas
df
variable
Run cancelled
SELECT * 
FROM netflix
Spinner
DataFrameas
df
variable
Run cancelled
SELECT *
FROM disney

We can also inspect the genres table, which is different from the other tables.

Spinner
DataFrameas
df
variable
Run cancelled
SELECT *
FROM genres

Preparing our data

Joining the different tables

Our data appears to mostly have the same column names. So we can join the data with a series of UNIONs, which will append each table to the previous one.

We use UNION ALL to preserve any possible duplicate rows, as we will want to count entries if they appear in multiple services.

Spinner
DataFrameas
df
variable
Run cancelled
SELECT *
FROM amazon
UNION ALL
SELECT *
FROM hulu
UNION ALL
SELECT * 
FROM netflix
UNION ALL
SELECT *
FROM disney

One problem with the above approach is that we lose out on the streaming service information. So let's repeat our query, but add in the required info!

Spinner
DataFrameas
df
variable
Run cancelled
SELECT *, 'amazon' AS service
FROM amazon
UNION
SELECT *, 'hulu' AS service
FROM hulu
UNION
SELECT *, 'netflix' AS service
FROM netflix
UNION
SELECT *, 'disney' AS service
FROM disney

Great! But we have one more table that might prove useful. Let's add in the genre information with a join.

To do this, we will need to use a Common Table Expression, or CTE.

Spinner
DataFrameas
df
variable
Run cancelled
WITH service_data AS (
	SELECT *, 'amazon' AS service
    FROM amazon
    UNION
    SELECT *, 'hulu' AS service
    FROM hulu
    UNION
    SELECT *, 'netflix' AS service
    FROM netflix
    UNION
    SELECT *, 'disney' AS service
    FROM disney
)

SELECT *
FROM service_data AS sd
LEFT JOIN genres AS g
	ON sd.title = g.film