Skip to content
Netflix
This dataset comprises Netflix's weekly top 10 lists for the most-watched TV shows and films worldwide. The data spans from June 28, 2021, to August 27, 2023.
Objective: Determine if there's a correlation between content duration and its likelihood of making it to the top 10 lists.
# Import your libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
global_top_10 = pd.read_csv("netflix_top10.csv", index_col=0)
global_top_10.head()countries_top_10 = pd.read_csv("netflix_top10_country.csv", index_col=0)
countries_top_10.head()After reading our data in csv file we will get quick information about our data
global_top_10.info()# Statistic summary
global_top_10.describe()global_top_10.columnsglobal_top_10.shapeBefore analyze our data to see what its distribution looks like
# plot histogram using pandas dataframe plot
global_top_10.plot.hist(bins=10)
plt.title("Netflix Global Top Ten")
plt.legend()Data distribution will not be accurate because you have missing values so we have to process them.
# Count the numbers of missing values in each columns
global_top_10.isna().sum()# Find the five percent threshold
threshold = len(global_top_10) * 0.05
print(threshold)