Back to Templates

Netflix Movie Data

This dataset (source: Kaggle) contains information of almost 8000 Netflix movies and shows. You can try a lot of data science concepts on this dataset; some examples are given at the end of this template.

# Load packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Load your data

# Load data from the csv file
df = pd.read_csv('netflix_dataset.csv', index_col=0)
df.head()
typetitledirectorcastcountrydate_addedrelease_yearratingdurationlisted_indescription
show_id
s1TV Show3%NaNJoão Miguel, Bianca Comparato, Michel Gomes, R...BrazilAugust 14, 20202020TV-MA4 SeasonsInternational TV Shows, TV Dramas, TV Sci-Fi &...In a future where the elite inhabit an island ...
s2Movie7:19Jorge Michel GrauDemián Bichir, Héctor Bonilla, Oscar Serrano, ...MexicoDecember 23, 20162016TV-MA93 minDramas, International MoviesAfter a devastating earthquake hits Mexico Cit...
s3Movie23:59Gilbert ChanTedd Chan, Stella Chung, Henley Hii, Lawrence ...SingaporeDecember 20, 20182011R78 minHorror Movies, International MoviesWhen an army recruit is found dead, his fellow...
s4Movie9Shane AckerElijah Wood, John C. Reilly, Jennifer Connelly...United StatesNovember 16, 20172009PG-1380 minAction & Adventure, Independent Movies, Sci-Fi...In a postapocalyptic world, rag-doll robots hi...
s5Movie21Robert LuketicJim Sturgess, Kevin Spacey, Kate Bosworth, Aar...United StatesJanuary 1, 20202008PG-13123 minDramasA brilliant group of students become card-coun...

Understand your variables

# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])

for i, var in enumerate(df.columns):
    variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
    
variables
VariableNumber of unique valuesValues
0type2[TV Show, Movie]
1title7787[3%, 7:19, 23:59, 9, 21, 46, 122, 187, 706, 19...
2director4049[nan, Jorge Michel Grau, Gilbert Chan, Shane A...
3cast6831[João Miguel, Bianca Comparato, Michel Gomes, ...
4country681[Brazil, Mexico, Singapore, United States, Tur...
5date_added1565[August 14, 2020, December 23, 2016, December ...
6release_year73[2020, 2016, 2011, 2009, 2008, 2019, 1997, 201...
7rating14[TV-MA, R, PG-13, TV-14, TV-PG, NR, TV-G, TV-Y...
8duration216[4 Seasons, 93 min, 78 min, 80 min, 123 min, 1...
9listed_in492[International TV Shows, TV Dramas, TV Sci-Fi ...
10description7769[In a future where the elite inhabit an island...

Answer interesting questions:

Now you get to explore this exciting dataset! Can't think of where to start? Try your hand at these questions:

  • How do the variables correlate?
  • Which countries have contributed most movies in recent years?
  • Which actors are most likely to work together?
# Start coding

Acknowledgements

Dataset source: Kaggle

Python

Netflix Movie Data

Analyze data such as length, description, cast, country etc. of about 8000 movies and TV-shows on Netflix.

Use Template