Analyzing Streaming Service Content in SQL

Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

To set up your integration, create a PostgreSQL integration with the following credentials:

Integration Name: Streaming Codealong
Hostname: workspacedemodb.datacamp.com
Database: streaming
Username: streaming_codealong
Password: streaming_codealong

Exploring our data

Let's start by checking out the data we will be working with. We can start with the amazon, hulu, netflix, and disney tables.

DataFrameas

df

variable

SELECT *
FROM amazon

DataFrameas

df

variable

SELECT *
FROM hulu;

DataFrameas

df

variable

SELECT *
FROM netflix;

DataFrameas

df

variable

SELECT *
FROM disney;

We can also inspect the genres table, which is different from the other tables.

DataFrameas

df

variable

SELECT *
FROM genres;

Preparing our data

Joining the different tables

Our data appears to mostly have the same column names. So we can join the data with a series of UNIONs, which will append each table to the previous one.

We use UNION ALL to preserve any possible duplicate rows, as we will want to count entries if they appear in multiple services.

DataFrameas

df

variable

SELECT *
FROM amazon
UNION ALL
SELECT *
FROM hulu
UNION ALL
SELECT *
FROM disney
UNION ALL
SELECT *
FROM netflix

One problem with the above approach is that we lose out on the streaming service information. So let's repeat our query, but add in the required info!

DataFrameas

df

variable

SELECT *, 'amazon' AS service
FROM amazon
UNION ALL
SELECT *, 'hulu' AS service
FROM hulu
UNION ALL
SELECT *, 'disney' AS service
FROM disney
UNION ALL
SELECT *, 'netflix' AS netflix
FROM netflix

Great! But we have one more table that might prove useful. Let's add in the genre information with a join.

To do this, we will need to use a Common Table Expression, or CTE.

UNION is used below and not UNION ALL because there will no longer be duplicates with the addition of the service column

DataFrameas

df

variable

WITH service_data AS (
    SELECT *, 'amazon' AS service
    FROM amazon
    UNION
    SELECT *, 'hulu' AS service
    FROM hulu
    UNION
    SELECT *, 'disney' AS service
    FROM disney
    UNION
    SELECT *, 'netflix' AS netflix
    FROM netflix
)

SELECT *
FROM service_data AS sd
LEFT JOIN genres AS g
    ON sd.title = g.film;

‌
‌
‌

Analyzing Streaming Service Content in SQL

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Analyzing Streaming Service Content in SQL

Exploring our data

Preparing our data

Joining the different tables

Analyzing Streaming Service Content in SQL