Skip to content
Netflix Movie Data
  • AI Chat
  • Code
  • Report
  • Netflix Movie Data

    This dataset contains more than 8,500 Netflix movies and TV shows, including cast members, duration, and genre. It contains titles added as recently as late September 2021.

    Not sure where to begin? Scroll to the bottom to find challenges!

    import pandas as pd
    
    ds = pd.read_csv("netflix_dataset.csv", index_col=0)
    df = pd.DataFrame(ds)

    Source of dataset.

    Don't know where to start?

    Challenges are brief tasks designed to help you practice specific skills:

    • πŸ—ΊοΈ Explore: How much variety exists in Netflix's offering? Base this on three variables: type, country, and listed_in.
    • πŸ“Š Visualize: Build a word cloud from the movie and TV shows descriptions. Make sure to remove stop words!
    • πŸ”Ž Analyze: Has Netflix invested more in certain genres (see listed_in) in recent years? What about certain age groups (see ratings)?

    Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

    A talent agency has hired you to analyze patterns in the professional relationships of cast members and directors. The key deliverable is a network graph where each node represents a cast member or director. An edge represents a movie or TV show worked on by both nodes in this undirected graph. You can limit the actors to the first four names listed in cast. The client is interested in any insights you can derive from your Netflix network analysis, such as actor/actor and actor/director pairs that work most closely together, most popular actors and directors to work with, and graph differences over time.

    You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.

    import numpy as np
    from matplotlib import pyplot as plt
    import seaborn as sns
    df
    df_null_count = df.isna().sum()
    df_null_count