Analyzing Streaming Service Content in SQL

Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

To set up a connection to the database:

Select 'Databases' in the left hand sidebar,
Click 'Connect Database',
Select 'PostgreSQL' and fill in the connection details below.

Connection details:

Database connection name: Streaming Codealong
Hostname: workspacedemodb.datacamp.com
Database: streaming
Username: streaming_codealong
Password: streaming_codealong

To consult the solution, head over to the file browser and select notebook-solution.ipynb.

Exploring our data

Let's start by checking out the data we will be working with. We can start with the amazon, hulu, netflix, and disney tables.

DataFrameas

df

variable

DataFrameas

df

variable

DataFrameas

df

variable

DataFrameas

df

variable

We can also inspect the genres table, which is different from the other tables.

DataFrameas

df

variable

Preparing our data

Joining the different tables

Our data appears to mostly have the same column names. So we can join the data with a series of UNIONs, which will append each table to the previous one.

We use UNION ALL to preserve any possible duplicate rows, as we will want to count entries if they appear in multiple services.

DataFrameas

df

variable

One problem with the above approach is that we lose out on the streaming service information. So let's repeat our query, but add in the required info!

DataFrameas

df

variable

Great! But we have one more table that might prove useful. Let's add in the genre information with a join.

To do this, we will need to use a Common Table Expression, or CTE.

DataFrameas

df

variable

Inspecting missing data

It looks like we are missing some values in the age and imdb columns. We will also check the rotten_tomatoes column because we may use it later. Let's see how extensive this problem is.

To calculate the null values per column, we will use a combination of SUM() and CASE WHEN (Invalid URL) to count the number of null values.

‌
‌
‌

Analyzing Streaming Service Content in SQL

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Analyzing Streaming Service Content in SQL

Exploring our data

Preparing our data

Joining the different tables

Inspecting missing data

Analyzing Streaming Service Content in SQL