Skip to content

Netflix! What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex your exploratory data analysis skills and dive into the entertainment industry.

You work for a production company that specializes in nostalgic styles. You want to do some research on movies released in the 1990's. You'll delve into Netflix data and perform exploratory data analysis to better understand this awesome movie decade!

You have been supplied with the dataset netflix_data.csv, along with the following table detailing the column names and descriptions. Feel free to experiment further after submitting!

The data

netflix_data.csv

ColumnDescription
show_idThe ID of the show
typeType of show
titleTitle of the show
directorDirector of the show
castCast of the show
countryCountry of origin
date_addedDate added to Netflix
release_yearYear of Netflix release
durationDuration of the show in minutes
descriptionDescription of the show
genreShow genre

Project Summary

This Python code utilizes the pandas library to perform exploratory data analysis on Netflix movie data, specifically focusing on movies released in the 1990s. It begins by reading the data from netflix_data.csv and filtering it to include only movies from the 1990s. The code then determines the most frequent movie duration during this decade, saving the result as an integer named duration. Additionally, it identifies short action movies (less than 90 minutes) released in the 1990s and counts them, saving this count as short_movie_count. This project demonstrates data analysis techniques to uncover trends in movie durations on Netflix during the 1990s.

# Importing pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# Read the Netflix data into a DataFrame
netflix_df = pd.read_csv('netflix_data.csv')

# Display the first 10 rows of the DataFrame
print(netflix_df.head(10))
# Convert DataFrame columns to a list
netflix_df_cols = netflix_df.columns.tolist()

# Display the list of columns
print(netflix_df_cols)
# Get the unique values in the 'type' column and convert them to a list
unique_type = netflix_df['type'].unique().tolist()

# Print the unique genres
print(unique_type)
# Get the unique values in the 'genre' column and convert them to a list
unique_genres = netflix_df['genre'].unique().tolist()

# Print the unique genres
print(unique_genres)
# Filter movies from the 1990s
netflix_df_1990s = netflix_df[(netflix_df['release_year'].between(1990, 1999))]

# Get the frequency table for movie durations (rounded to nearest integer)
duration_counts = netflix_df_1990s['duration'].astype(int).value_counts()

# Find the duration with the highest count (most frequent)
most_frequent_duration = duration_counts.idxmax()  # Get the index (duration) with the highest count

# Save the approximate answer as an integer
duration = int(most_frequent_duration)

# Create a subset of the DataFrame containing only movies
action_movies = netflix_df_1990s[(netflix_df_1990s['type'] == 'Movie') & (netflix_df_1990s['genre'] == 'Action')]

# Visualize the duration column of your filtered data to see the distribution of 1990s Action movie durations
plt.hist(action_movies["duration"])
plt.title('Distribution of Action Movie Durations in the 1990s')
plt.xlabel('Duration (minutes)')
plt.ylabel('Number of Movies')
plt.show()

# Define the columns to include in the analysis
subset_columns = ['title', 'country', 'genre', 'release_year', 'duration']

# Extract the relevant columns for further analysis
netflix_movies_subset = action_movies[subset_columns]

# Create a subset of short movies with a duration less than 90 minutes
short_action_movies = netflix_movies_subset[netflix_movies_subset['duration'] < 90]

# Count the number of short action movies
short_movie_count = len(short_action_movies)
# Print the most frequent movie duration
print('Most frequent Movie Duration (Minutes): ', duration)

# Print the number of short action movies found
print('\nNumber of Short Action Movies: ', short_movie_count)