Skip to content

Investigating Netflix Movies with the help of visualizations generated with AI

Introduction

Netflix, one of the world's leading streaming services, offers a vast and diverse library of movies and TV shows. Understanding the composition and trends within this library can provide valuable insights into the content preferences of audiences, regional content production trends, and the evolving landscape of the entertainment industry. In this exploratory data analysis (EDA), we will delve into a dataset of Netflix movies and TV shows, utilizing the power of artificial intelligence (AI) and Python programming to uncover hidden patterns and insights.

This notebook aims to guide you through a detailed exploration of the Netflix dataset, which comprises various features such as the type of show, title, director, cast, country of origin, date added to Netflix, release year, duration, description, and genre. By examining these features, we can gain a deeper understanding of the following:

  • Distribution of Show Types: Analyzing the prevalence of different types of content, such as movies and TV shows.
  • Yearly Trends in Show Additions: Investigating how the number of shows added to Netflix has evolved over the years.
  • Top Genres on Netflix: Identifying the most popular genres among Netflix offerings.
  • Most Common Directors and Cast Members: Highlighting the directors and actors/actresses who frequently contribute to Netflix content.
  • Country-wise Distribution of Shows: Exploring the geographic diversity of Netflix content.
  • Duration Analysis: Understanding the typical length of shows and identifying any common durations.
  • Release Year Trends: Examining the release years of shows to uncover historical content trends.
  • Popular Show Descriptions: Using text analysis to identify common themes in show descriptions.
  • Correlation Analysis: Investigating potential relationships between different features in the dataset.
  • Seasonal Trends: Determining if there are any patterns in the timing of show additions.
  • Genre vs. Country Analysis: Exploring the relationship between show genres and their countries of origin.
  • Changes in Content Length Over Time: Investigating how the duration of shows has changed over time.

By leveraging AI and data visualization techniques, we will extract meaningful insights from the dataset, providing a comprehensive overview of the content available on Netflix. Whether you are a data enthusiast, a Netflix subscriber, or an industry professional, this analysis will offer valuable perspectives on the streaming giant's content library.

Let's embark on this journey to uncover the fascinating world of Netflix movies and TV shows with the help of AI.

The data netflix_data.csv

We have been supplied with the dataset netflix_data.csv, along with the following table detailing the column names and descriptions.

ColumnDescription
show_idThe ID of the show
typeType of show
titleTitle of the show
directorDirector of the show
castCast of the show
countryCountry of origin
date_addedDate added to Netflix
release_yearYear of Netflix release
durationDuration of the show in minutes
descriptionDescription of the show
genreShow genre

Data Preparation

# Importing used modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd

# Set the aesthetic style of the plots
sns.set_style("whitegrid")

# Reading in the Netflix CSV as a DataFrame
netflix_df = pd.read_csv("netflix_data.csv")

# Display head of the DataFrame
netflix_df.head()
# What is the shape of the DataFrame?
netflix_df.shape
# Are there any missing values?
netflix_df.isna().any().sum()
# How do numerical features describe?
netflix_df.describe()

From this last table, we observe that all the Netflix shows have been released during a period ranging from 1942 to 2021, but that most of (75%) of the shows have been released after 2011. The majority of the Netflix offering is quite recent!

1. Distribution of Show Types

Objective: Understand the distribution of different types of shows (e.g., movies, TV shows). Analysis: Create a bar chart showing the count of each type.
# Create a bar chart showing the count of each type
plt.figure(figsize=(10, 6))
sns.countplot(data=netflix_df, x='type', palette='viridis')

# Add title and labels
plt.title('Distribution of Show Types on Netflix')
plt.xlabel('Type of Show')
plt.ylabel('Count')

# Show the plot
plt.show()

2. Yearly Trends in Show Additions

Objective: Analyze how the number of shows added to Netflix has changed over the years. Analysis: Plot the number of shows added each year using date_added.
# Convert the 'date_added' column to datetime format
netflix_df['date_added'] = pd.to_datetime(netflix_df['date_added'])

# Extract the year from the 'date_added' column
netflix_df['year_added'] = netflix_df['date_added'].dt.year

# Group by the year and count the number of shows added each year
shows_per_year = netflix_df.groupby('year_added').size().reset_index(name='count')

# Plot the number of shows added each year
plt.figure(figsize=(12, 6))
sns.lineplot(data=shows_per_year, x='year_added', y='count', marker='o', palette='viridis')

# Add title and labels
plt.title('Number of Shows Added to Netflix Each Year')
plt.xlabel('Year Added')
plt.ylabel('Number of Shows')

# Show the plot
plt.show()

3. Top Genres on Netflix

Objective: Identify the most popular genres available on Netflix. Analysis: Create a bar chart and word cloud to show the frequency of each genre.