📖 Background
The Netflix Top 10 charts represent the most popular movies and TV series, with millions of viewers around the globe. Understanding what makes the biggest hits is crucial to making more hits.
💪 Challenge
Explore the dataset to understand the most common attributes of popular Netflix content. Your published notebook should contain a short report on the popular content, including summary statistics, visualizations, statistical models, and text describing any insights you found.
💾 The data
There are three datasets taken from Netflix Top 10.
Each dataset is stored as a table in a PostgreSQL database.
all_weeks_global
: This contains the weekly top 10 list for movies (films) and TV series at a global level.all_weeks_countries
: This contains the weekly top 10 list for movies (films) and TV series by country.most_popular
: All-time most popular content by number of hours viewed in the first 28 days from launch.
The data source page describes the methodology for data collection in detail. In particular:
- Content is categorized as Film (English), TV (English), Film (Non-English), and TV (Non-English).
- Each season of a TV series is considered separately.
-
- Popularity is measured as the total number of hours that Netflix members around the world watched each title from Monday to Sunday of the previous week.
- Weekly reporting is rounded to the nearest 10 000 viewers.
Database integration
To access the data, use the sample integration named "Competition Netflix Top 10".
Top Weekly Global Movies on Netflix
SELECT *
FROM all_weeks_global
4 hidden cells
min_week = min (world['week'])
max_week = max(world['week'])
min_week, max_week, max_week-min_week
len(world.week.value_counts())
This data was derived for over a period of 75 weeks (518 days) from 2021-07-04 to 2022-12-04
1 hidden cell
Most watched category worldwide in descending order
world.groupby(['category'])['weekly_hours_viewed'].mean().round().sort_values(ascending = False).reset_index()
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure (figsize = (10,8))
sns.lineplot(x='week', y = 'weekly_hours_viewed', hue = 'category', data = world)
plt.ylabel('Weekly hours viewed (per 100 million)')
plt.xlabel ('Week')
plt.xticks (rotation = 60)
sns.despine()
‌
‌