About The Dataset
There are two datasets being considered for my analysis. They are netflix_top10.csv
and netflix_top10_country.csv
.
netflix_top10.csv
contains data on the top 10 movies for each week from 4 July, 2021 to 27 August, 2023.
The rank of the shows on the netflix_top10.csv
are decided by dividing the weekly hours viewed by runtime. This system was used for shows published after June 18, 2023. Previously, the rank was decided by just the weekly hours viewed.
The netflix_top10.csv
has 4520 entries and 11 columns. Some of the columns have missing data. The details will be given below. The columns of the dataset include:
- week: represents the date the ranking was released. It has no missing entries.
- category: represents the four categories of shows which are: Films(English), Films(Non-English), TV(English) and TV(Non-English). It has no missing entries.
- weekly rank: represents the rank/position of each show. It lies between the first and tenth position. It has no missing entries.
- show title: represents the show title or TV title. It has no missing entries.
- season title: represents the season title. It has 2333 entries missing but it seems most of the Films(English) and Films(Non-English) category of shows do not need this column. This column is mainly for seasonal shows which is the TV(English) and TV(Non-English) category. There are 75 entries missing for this category.
- weekly hours viewed: represents the total hours the show has been watched in a week. It has no missing entries.
- runtime: represents the time that a show lasts. It has 440 entries because this data was collected from June 18, 2023.
- weekly views: is derived from dividing weekly hours viewed by runtime. It has 440 entries because runtime was collected from June 18, 2023.
- cumulative weeks in top 10: represents the number of times a show has been in the global top 10. In the dataset, for shows in the TV(English) and TV(Non-English) category, each season represents its own show and the cumulative weeks are counted for each season. It has no missing entries.
- is staggered launch: represents if a show was would be released at different times. This is only True for the TV(English and Non-English) category of shows. It has no missing entries.
- episode launch details: contains details of the episodes of show launched. It shows how many episodes was launched and the countries the episode was launched in.It has only 17 entries because it is only documented when is staggered launch is True.
netflix_top10_country.csv
contains data for the top 10 movies for each week and each of the 94 countries from 4 July, 2021 to 27 August, 2023.
The netflix_top10_country.csv
has 210880 entries and 8 columns. Aside from the season title column, all other columns in the dataset do not have any missing data. All columns of this dataset includes:
- country name: The name of the country.
- country iso2: These are two letters used to represent the country.
- week: represents the date the ranking was released.
- category: represents two categories of shows which are: Films and TV.
- weekly rank: represents the rank/position of each show. It lies between the first and tenth position.
- show title: represents the show title or TV title.
- season title: represents the season title. It has 108945 entries missing. Similar to the
global_top_10
dataset, most of the Films category do not require this column. Thus vastly contributing to the number of missing entries. There are 3536 entries missing for the TV category. - cumulative weeks in top 10: represents the number of times a show has been in the global top 10. In the dataset, for shows in the TV category, each season represents its own show and the cumulative weeks are counted for each season.
1 hidden cell
Netflix Top 10: Analyzing Weekly Chart-Toppers
This dataset comprises Netflix's weekly top 10 lists for the most-watched TV shows and films worldwide. The data spans from June 28, 2021, to August 27, 2023.
This workspace is pre-loaded with two CSV files.
netflix_top10.csv
contains columns such asshow_title
,category
,weekly_rank
, and several view metrics.netflix_top10_country.csv
has information about a show or film's performance by country, contained in the columnscumulative_weeks_in_top_10
andweekly_rank
.
4 hidden cells
Data Cleaning
For the netflix_top10.csv
dataset. All columns with missing data have been removed. When there is need for further analysis, some of them will be reintroduced. The week column was converted to a datetime object.
For the netflix_top10_country.csv
, the season title column has been removed.
# Check the info of the data to see if it is accurate.
countries_top_10.info()
countries_top_10 = countries_top_10.drop('season_title', axis=1)
countries_top_10.info()
for col in global_top_10.columns:
print(col.replace('_',' '), end=' - ')
print(global_top_10[col].unique(), end='\n\n')
for col in countries_top_10.columns:
print(col.replace('_',' '), end=' - ')
print(countries_top_10[col].unique(), end='\n\n')
global_top_10
Data Analysis
1. What are the top 10 show title of the Film category that has the highest number in the cummulative weeks in top 10 column?