Skip to content
Investigating Movies (basic analysis)
Netflix. What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies. You work for a production company that specializes in nostalgic styles. You want to do some research on movies released in the 1990's. You'll delve into Netflix data and perform exploratory data analysis to better understand this awesome movie decade. You have been supplied with the dataset netflix_data.csv. |
import pandas as pd
import matplotlib.pyplot as plt
# read in data as a dataframe
netflix_df = pd.read_csv("netflix_data.csv")
netflix_df.head()# filter the dataframe for movies released between 1990 and 1999
movies_90s = netflix_df[(netflix_df['type'] == 'Movie') &
(netflix_df['release_year'] >= 1990) &
(netflix_df['release_year'] <= 1999)]
movies_90s.head(10)# prepped lists
short = []
standard = []
# loop through rows using .iterrows()
for index, row in movies_90s.iterrows():
if row['duration'] < 90 and row['genre'] == 'Action':
short.append(1)
else:
standard.append(1)
short_movie_count = sum(short)
standard_movie_count = sum(standard)
# use formatted strings for the output
print("Filtering complete.")
print(f"Short movies: {short_movie_count}")
print(f"Standard movies: {standard_movie_count}")
# find the most frequent duration
duration = int(movies_90s['duration'].mode())
print(f"The most frequnt movie time in the 90s was {duration} minutes.")