CARPRICE PREDICTION

Clean the data by:

removing rows that have values which are missing - DONE, changing the data type of some values within a column, and removing columns which are not relevant to this task. Think about how each column might be relevant to the business question you’re investigating. If you can’t think of why a column may be useful, it may not be worth including it.

Your end result should be three cleaned data sets

Business Question:

An analysis of their content categories that highlights the top 5 categories with the largest aggregate popularity

import pandas as pd
import numpy as np

reactions = pd.read_csv('Reactions.csv')
reaction_types = pd.read_csv('ReactionTypes.csv')
content = pd.read_csv('Content.csv')
print(reactions.info())

print(reaction_types.info())

print(content.info())
print(content.count())

print(reactions.info())
print(reactions.count())

print(reaction_types.info())
print(reaction_types.count())

print(reaction_types)

#count the missing values in each column
content.isna().sum()

reaction_types.isna().sum()

reactions.isna().sum()

#dropping rows that have columns with null values ie UserID and Type
reactions.dropna(subset=['User ID'], inplace=True)
reactions.dropna(subset=['Type'], inplace=True)

#checking if null values have been dropped
reactions.isna().sum()

#dropping rows with null values in certain columns
content.dropna(subset=['URL'], inplace=True)

#checking if null values have been dropped
content.isna().sum()

‌
‌
‌