Skip to content
Netflix Top 10
  • AI Chat
  • Code
  • Report
  • Netflix Top 10: Analyzing Weekly Chart-Toppers

    This dataset comprises Netflix's weekly top 10 lists for the most-watched TV shows and films worldwide. The data spans from June 28, 2021, to August 27, 2023.

    This workspace is pre-loaded with two CSV files.

    • netflix_top10.csv contains columns such as show_title, category, weekly_rank, and several view metrics.
    • netflix_top10_country.csv has information about a show or film's performance by country, contained in the columns cumulative_weeks_in_top_10 and weekly_rank.

    Source: Netflix

    About The Dataset

    There are two datasets being considered for my analysis. They are netflix_top10.csv and netflix_top10_country.csv.

    The rank of the shows on the netflix_top10.csv are decided by dividing the weekly hours viewed by runtime. This system was used for shows published after June 18, 2023. Previously, the rank was decided by just the weekly hours viewed. The netflix_top10.csv has 4520 entries and 11 columns. Some of the columns have missing data. The details will be given below. The columns of the dataset include:

    1. week: represents the date the ranking was released. It has no missing entries.
    2. category: represents the four categories of shows which are: Films(English), Films(Non-English), TV(English) and TV(Non-English). It has no missing entries.
    3. weekly rank: represents the rank/position of each show. It lies between the first and tenth position. It has no missing entries.
    4. show title: represents the show title or TV title. It has no missing entries.
    5. season title: represents the season title. It has 2333 entries missing but it seems most of the Films(English) and Films(Non-English) category of shows do not need this column because its mainly for seasonal shows which is the TV(English) and TV(Non-English) category. There are 75 entries missing for this category.
    6. weekly hours viewed: represents the total hours the show has been watched in a week. It has no missing entries.
    7. runtime: represents the time that a show lasts. It has 440 entries because this data was collected from June 18, 2023.
    8. weekly views: is derived from dividing weekly hours viewed by runtime. It has 440 entries because runtime was collected from June 18, 2023.
    9. cumulative weeks in top 10: represents the number of times a show has been in the global top 10. In the dataset, for shows in the TV(English) and TV(Non-English) category, each season represents its own show and the cumulative weeks are counted for each season. It has no missing entries.
    10. is staggered launch: represents if a show was would be released at different times. This is only True for the TV(English and Non-English) category of shows. It has no missing entries.
    11. episode launch details: contains details of the episodes of show launched. It shows how many episodes was launched and the countries the episode was launched in.It has only 17 entries because it is only documented when is staggered launch is True.

    The netflix_top10_country.csv has 210880 entries and 8 columns. Except from the season title column, all other columns in the dataset do not have any missing data. All columns of this dataset includes:

    1. country name: The name of the country.
    2. country iso2: These are two letters used to represent the country.
    3. week: represents the date the ranking was released.
    4. category: represents two categories of shows which are: Films and TV.
    5. weekly rank: represents the rank/position of each show. It lies between the first and tenth position.
    6. show title: represents the show title or TV title.
    7. season title: represents the season title. It has 108945 entries missing. Similar to the global_top_10 dataset, most of the Films category do not require this column. Thus vastly contributing to the number of missing entries. There are 3536 entries missing for the TV category.
    8. cumulative weeks in top 10: represents the number of times a show has been in the global top 10. In the dataset, for shows in the TV category, each season represents its own show and the cumulative weeks are counted for each season.

    Data Cleaning

    For the netflix_top10.csv dataset. All columns with missing data have been removed. When there is need for further analysis, some of them will be reintroduced. The week column was converted to a datetime object.

    For the netflix_top10_country.csv, the season title column has been removed.

    Data Analysis

    Focus would be made firstly on the netflix_top10 dataset.

    These are the questions that I seek to answer with my analysis.

    1. What are the top 10 show title of the Film category that has the highest number in the cummulative weeks in top 10 column?
    2. Does a staggered launch affect the weekly rank?
    3. Which category has the highest weekly hours viewed?
    4. How does the Films (English) and Films (Non-English) category span over time?
    1. What are the top 10 show title of the Film category that has the highest number in the cummulative weeks in top 10 column?

    Due to the TV(English and Non-English) category of shows having a higher chance of appearing in the cummulative weeks in top 10 due to having multiple episodes (Find a way to prove this). The Film(English and Non-English) category would be focused on.

    Hidden code
    Hidden code
    2. Does a staggered launch affect the weekly rank?
    Hidden code