Skip to content
Clean the data by:
removing rows that have values which are missing - DONE, changing the data type of some values within a column, and removing columns which are not relevant to this task. Think about how each column might be relevant to the business question you’re investigating. If you can’t think of why a column may be useful, it may not be worth including it.
Your end result should be three cleaned data sets
Business Question:
- An analysis of their content categories that highlights the top 5 categories with the largest aggregate popularity
import pandas as pd
import numpy as np
reactions = pd.read_csv('Reactions.csv')
reaction_types = pd.read_csv('ReactionTypes.csv')
content = pd.read_csv('Content.csv')
print(reactions.info())
print(reaction_types.info())
print(content.info())
print(content.count())
print(reactions.info())
print(reactions.count())
print(reaction_types.info())
print(reaction_types.count())
print(reaction_types)
#count the missing values in each column
content.isna().sum()
reaction_types.isna().sum()
reactions.isna().sum()
#dropping rows that have columns with null values ie UserID and Type
reactions.dropna(subset=['User ID'], inplace=True)
reactions.dropna(subset=['Type'], inplace=True)
#checking if null values have been dropped
reactions.isna().sum()
#dropping rows with null values in certain columns
content.dropna(subset=['URL'], inplace=True)
#checking if null values have been dropped
content.isna().sum()