Back to Templates
Netflix Movie Data
This dataset (source: Kaggle) contains information of almost 8000 Netflix movies and shows. You can try a lot of data science concepts on this dataset; some examples are given at the end of this template.
# Load packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Load your data
# Load data from the csv file
df = pd.read_csv('netflix_dataset.csv', index_col=0)
df.head()
type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
---|---|---|---|---|---|---|---|---|---|---|---|
show_id | |||||||||||
s1 | TV Show | 3% | NaN | João Miguel, Bianca Comparato, Michel Gomes, R... | Brazil | August 14, 2020 | 2020 | TV-MA | 4 Seasons | International TV Shows, TV Dramas, TV Sci-Fi &... | In a future where the elite inhabit an island ... |
s2 | Movie | 7:19 | Jorge Michel Grau | Demián Bichir, Héctor Bonilla, Oscar Serrano, ... | Mexico | December 23, 2016 | 2016 | TV-MA | 93 min | Dramas, International Movies | After a devastating earthquake hits Mexico Cit... |
s3 | Movie | 23:59 | Gilbert Chan | Tedd Chan, Stella Chung, Henley Hii, Lawrence ... | Singapore | December 20, 2018 | 2011 | R | 78 min | Horror Movies, International Movies | When an army recruit is found dead, his fellow... |
s4 | Movie | 9 | Shane Acker | Elijah Wood, John C. Reilly, Jennifer Connelly... | United States | November 16, 2017 | 2009 | PG-13 | 80 min | Action & Adventure, Independent Movies, Sci-Fi... | In a postapocalyptic world, rag-doll robots hi... |
s5 | Movie | 21 | Robert Luketic | Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar... | United States | January 1, 2020 | 2008 | PG-13 | 123 min | Dramas | A brilliant group of students become card-coun... |
Understand your variables
# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])
for i, var in enumerate(df.columns):
variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
variables
Variable | Number of unique values | Values | |
---|---|---|---|
0 | type | 2 | [TV Show, Movie] |
1 | title | 7787 | [3%, 7:19, 23:59, 9, 21, 46, 122, 187, 706, 19... |
2 | director | 4049 | [nan, Jorge Michel Grau, Gilbert Chan, Shane A... |
3 | cast | 6831 | [João Miguel, Bianca Comparato, Michel Gomes, ... |
4 | country | 681 | [Brazil, Mexico, Singapore, United States, Tur... |
5 | date_added | 1565 | [August 14, 2020, December 23, 2016, December ... |
6 | release_year | 73 | [2020, 2016, 2011, 2009, 2008, 2019, 1997, 201... |
7 | rating | 14 | [TV-MA, R, PG-13, TV-14, TV-PG, NR, TV-G, TV-Y... |
8 | duration | 216 | [4 Seasons, 93 min, 78 min, 80 min, 123 min, 1... |
9 | listed_in | 492 | [International TV Shows, TV Dramas, TV Sci-Fi ... |
10 | description | 7769 | [In a future where the elite inhabit an island... |
Answer interesting questions:
Now you get to explore this exciting dataset! Can't think of where to start? Try your hand at these questions:
- How do the variables correlate?
- Which countries have contributed most movies in recent years?
- Which actors are most likely to work together?
# Start coding
Acknowledgements
Dataset source: Kaggle
Netflix Movie Data
Analyze data such as length, description, cast, country etc. of about 8000 movies and TV-shows on Netflix.