Skip to content
Analyzing Streaming Service Content in SQL - Codealong
  • AI Chat
  • Code
  • Report
  • Analyzing Streaming Service Content in SQL

    Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

    To set up your integration, create a PostgreSQL integration with the following credentials:

    • Integration Name: Streaming Codealong
    • Hostname: workspacedemodb.datacamp.com
    • Database: streaming
    • Username: streaming_codealong
    • Password: streaming_codealong

    Exploring our data

    Let's start by checking out the data we will be working with. We can start with the amazon, hulu, netflix, and disney tables.

    Spinner
    DataFrameavailable as
    df
    variable
    SELECT *
    FROM amazon
    Current Type: Bar
    Current X-axis: None
    Current Y-axis: None
    Current Color: None
    Spinner
    DataFrameavailable as
    df
    variable
    SELECT *
    FROM hulu
    Spinner
    DataFrameavailable as
    df
    variable
    SELECT *
    FROM netflix
    Spinner
    DataFrameavailable as
    df
    variable
    SELECT *
    FROM disney

    We can also inspect the genres table, which is different from the other tables.

    Spinner
    DataFrameavailable as
    df
    variable

    Preparing our data

    Joining the different tables

    Our data appears to mostly have the same column names. So we can join the data with a series of UNIONs, which will append each table to the previous one.

    We use UNION ALL to preserve any possible duplicate rows, as we will want to count entries if they appear in multiple services.

    Spinner
    DataFrameavailable as
    df
    variable

    One problem with the above approach is that we lose out on the streaming service information. So let's repeat our query, but add in the required info!

    Spinner
    DataFrameavailable as
    df
    variable

    Great! But we have one more table that might prove useful. Let's add in the genre information with a join.

    To do this, we will need to use a Common Table Expression, or CTE.

    Spinner
    DataFrameavailable as
    df
    variable