Skip to content
Investigating Netflix Movies and Guest Stars in The Office
Exploring a sample of the Netflix data
# Create the years and durations lists
years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
durations = [103, 101, 99, 100, 100, 95, 95, 96, 93, 90]
# Create a dictionary with the two lists
movie_dict = {"years" : years, "durations" : durations}
# Print the dictionary
movie_dictCreating a DataFrame from a dictionary
# Import pandas under its usual alias
import pandas as pd
# Create a DataFrame from the dictionary
durations_df = pd.DataFrame(movie_dict)
# Print the DataFrame
durations_dfA visual inspection of our data
# Import matplotlib.pyplot under its usual alias and create a figure
import matplotlib.pyplot as plt
fig = plt.figure()
# Draw a line plot of release_years and durations
plt.plot(durations_df["years"], durations_df["durations"])
# Create a title
plt.title("Netflix Movie Durations 2011-2020")
# Show the plot
plt.show()Loading the data from the CSV
Access to the CSV file, available at the path "datasets/netflix_data.csv".
# Read in the CSV as a DataFrame
netflix_df = pd.read_csv("datasets/netflix_data.csv")
# Print the first five rows of the DataFrame
netflix_df.head()Filtering for movies
# Subsetting the DataFrame for type "Movie"
netflix_df_movies_only = netflix_df.query("type == 'Movie'")
# Selecting columns of interest
netflix_movies_col_subset = netflix_df_movies_only[["title", "country", "genre", "release_year",
"duration"]]
# Print the first five rows of the new DataFrame
netflix_movies_col_subset.head()Creating a scatter plot
# figure size
fig = plt.figure(figsize=(12,8))
# Scatter plot of duration versus year
plt.scatter(x = netflix_movies_col_subset["release_year"], y = netflix_movies_col_subset["duration"])
# figure title
plt.title("Movie Duration by Year of Release")
# Showing the plot
plt.show()Further exploration
# Filter for durations shorter than 60 minutes
short_movies = netflix_movies_col_subset.query("duration < 60")
# Print the first 20 rows of short_movies
short_movies.head(20)Assigning colors to genre