Netflix! What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.
Given the large number of movies and series available on the platform, it is a perfect opportunity to flex my exploratory data analysis skills and dive into the entertainment industry. My friend has also been brushing up on his Python skills and has taken a first crack at a CSV file containing Netflix data. He believes that the average duration of movies has been declining. Using my friends initial research, I'll dive into the Netflix data to see if I can determine whether movie lengths are actually getting shorter and explain some of the contributing factors, if any.
I have been supplied with the dataset netflix_data.csv , along with the following table detailing the column names and descriptions:
The data
netflix_data.csv
| Column | Description |
|---|---|
show_id | The ID of the show |
type | Type of show |
title | Title of the show |
director | Director of the show |
cast | Cast of the show |
country | Country of origin |
date_added | Date added to Netflix |
release_year | Year of Netflix release |
duration | Duration of the show in minutes |
description | Description of the show |
genre | Show genre |
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# Loading the dataset
netflix_df = pd.read_csv('netflix_data.csv')
# Filtering the dataset for movies only
netflix_movies = netflix_df[netflix_df['type'] == 'Movie']
# Selecting relevant columns
netflix_movies = netflix_movies[['title', 'country', 'genre', 'release_year', 'duration']]
# Filtering for short movies (duration < 60 minutes)
short_movies = netflix_movies[netflix_movies['duration'] < 60]
# Displaying the first 20 short movies
print(short_movies.head(20))
# Assigning colors based on genre
genre_colors = {
'Children': 'blue',
'Documentaries': 'green',
'Stand_Up': 'orange'
}
# To create a list of colors for each movie based on its genre
colors = [genre_colors.get(row['genre'], 'gray') for index, row in netflix_movies.iterrows()]
# Plotting the data
plt.figure(figsize=(12, 8))
plt.scatter(netflix_movies['release_year'], netflix_movies['duration'], c=colors)
plt.xlabel("Release Year")
plt.ylabel("Duration (min)")
plt.title("Movie Duration by Year of Release")
plt.show()
# Are we certain that movies are getting shorter?
answer = 'no'Netflix Movie Duration Analysis Report
Introduction
This report analyzes trends in Netflix movie durations over time, with a particular focus on identifying whether movies are getting shorter. The analysis was conducted using Python with pandas for data manipulation and matplotlib for visualization.
Methodology
-
Data Preparation:
- Loaded Netflix data from 'netflix_data.csv'
- Filtered for only movie content (excluding TV shows)
- Selected relevant columns: title, country, genre, release_year, and duration
-
Special Focus:
- Identified short movies (duration < 60 minutes) for closer examination
- Created a visualization of movie durations by release year
-
Visual Encoding:
- Applied color coding by genre:
- Children: Blue
- Documentaries: Green
- Stand-Up: Orange
- All others: Gray
- Applied color coding by genre:
Key Findings
Short Movies Analysis
The first 20 short movies (duration < 60 minutes) include:
- Holiday specials (e.g., "6 Go! Go! Grey Carson Christmas")
- Documentaries (e.g., "3 Seconds Divorce")
- Children's content (e.g., "18B Things to do Before High School")
- Various Christmas specials and short films
Duration Trends Visualization
The scatter plot "Movie Duration by Year of Release" shows:
- A wide distribution of movie durations across all years
- No clear downward trend in movie durations over time
- Short movies (under 60 minutes) appear consistently throughout the timeline
- Most movies cluster between 60-150 minutes regardless of release year
Genre Patterns
The color-coded visualization reveals:
- Children's content (blue) tends to have shorter durations
- Documentaries (green) show a mix of short and medium lengths
- Stand-Up specials (orange) typically fall in the medium duration range
Conclusion
Based on the analysis:
- Movies are not demonstrably getting shorter over time
- Short movies (under 60 minutes) have consistently existed alongside standard-length features
- Genre appears to be a stronger predictor of duration than release year
The answer to "Are we certain that movies are getting shorter?" is: No
Recommendations
- Further analysis could examine duration trends within specific genres
- Investigating the relationship between country of origin and duration might reveal additional patterns
- A time-series analysis with moving averages could provide more nuanced insights about duration trends