Skip to content

Exploring a sample of the Netflix data

# Create the years and durations lists
years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
durations = [103, 101, 99, 100, 100, 95, 95, 96, 93, 90]

# Create a dictionary with the two lists
movie_dict = {"years" : years, "durations" : durations}

# Print the dictionary
movie_dict

Creating a DataFrame from a dictionary

# Import pandas under its usual alias
import pandas as pd

# Create a DataFrame from the dictionary
durations_df = pd.DataFrame(movie_dict)

# Print the DataFrame
durations_df

A visual inspection of our data

# Import matplotlib.pyplot under its usual alias and create a figure
import matplotlib.pyplot as plt
fig = plt.figure()

# Draw a line plot of release_years and durations
plt.plot(durations_df["years"], durations_df["durations"])

# Create a title
plt.title("Netflix Movie Durations 2011-2020")

# Show the plot
plt.show()

Loading the data from the CSV

Access to the CSV file, available at the path "datasets/netflix_data.csv".

# Read in the CSV as a DataFrame
netflix_df = pd.read_csv("datasets/netflix_data.csv")

# Print the first five rows of the DataFrame
netflix_df.head()

Filtering for movies

# Subsetting the DataFrame for type "Movie"
netflix_df_movies_only = netflix_df.query("type == 'Movie'")

# Selecting columns of interest
netflix_movies_col_subset = netflix_df_movies_only[["title", "country", "genre", "release_year", 
                                                   "duration"]]

# Print the first five rows of the new DataFrame
netflix_movies_col_subset.head()

Creating a scatter plot

# figure size
fig = plt.figure(figsize=(12,8))

# Scatter plot of duration versus year
plt.scatter(x = netflix_movies_col_subset["release_year"], y = netflix_movies_col_subset["duration"])

# figure title
plt.title("Movie Duration by Year of Release")

# Showing the plot
plt.show()

Further exploration

# Filter for durations shorter than 60 minutes
short_movies = netflix_movies_col_subset.query("duration < 60")

# Print the first 20 rows of short_movies
short_movies.head(20)

Assigning colors to genre