Skip to content
CARPRICE PREDICTION
  • AI Chat
  • Code
  • Report
  • Clean the data by:

    removing rows that have values which are missing - DONE, changing the data type of some values within a column, and removing columns which are not relevant to this task. Think about how each column might be relevant to the business question you’re investigating. If you can’t think of why a column may be useful, it may not be worth including it.

    Your end result should be three cleaned data sets

    Business Question:

    • An analysis of their content categories that highlights the top 5 categories with the largest aggregate popularity
    import pandas as pd
    import numpy as np
    
    reactions = pd.read_csv('Reactions.csv')
    reaction_types = pd.read_csv('ReactionTypes.csv')
    content = pd.read_csv('Content.csv')
    print(reactions.info())
    print(reaction_types.info())
    print(content.info())
    print(content.count())
    print(reactions.info())
    print(reactions.count())
    print(reaction_types.info())
    print(reaction_types.count())
    print(reaction_types)
    #count the missing values in each column
    content.isna().sum()
    
    reaction_types.isna().sum()
    
    reactions.isna().sum()
    #dropping rows that have columns with null values ie UserID and Type
    reactions.dropna(subset=['User ID'], inplace=True)
    reactions.dropna(subset=['Type'], inplace=True)
    
    #checking if null values have been dropped
    reactions.isna().sum()
    #dropping rows with null values in certain columns
    content.dropna(subset=['URL'], inplace=True)
    
    
    #checking if null values have been dropped
    content.isna().sum()