Skip to content
Competition - Cats and dogs
Cats vs Dogs: The Great Pet Debate ๐ฑ๐ถ
๐ Background
You and your friend have debated for years whether cats or dogs make more popular pets. You finally decide to settle the score by analyzing pet data across different regions of the UK. Your friend found data on estimated pet populations, average pets per household, and geographic factors across UK postal code areas. It's time to dig into the numbers and settle the cat vs. dog debate!
๐พ The data
There are three data files, which contains the data as follows below.
The population_per_postal_code.csv data contains these columns:
population_per_postal_code.csv data contains these columns:| Column | Description |
|---|---|
postal_code | An identifier for each postal code area |
estimated_cat_population | The estimated cat population for the postal code area |
estimated_dog_population | The estimated cat population for the postal code area |
The avg_per_household.csv data contains these columns:
avg_per_household.csv data contains these columns:| Column | Description |
|---|---|
postal_code | An identifier for each postal code area |
cats_per_household | The average number of cats per household in the postal code area |
dog_per_household | The average number of dogs per household in the postal code area |
The postal_code_areas.csv data contains these columns:
postal_code_areas.csv data contains these columns:| Column | Description |
|---|---|
postal_code | An identifier for each postal code area |
town | The town/towns which are contained in the postal code area |
county | The UK county that the postal code area is located in |
population | The population of people in each postal code area |
num_households | The number of households in each postal code area |
uk_region | The region in the UK which the postal code is located in |
*Acknowledgments: Data has been assembled and modified from two different sources: Animal and Plant Health Agency, Postcodes.
Import Necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.impute import SimpleImputer
population_raw_data = pd.read_csv('data/population_per_postal_code.csv')
population_raw_data.rename(columns={'postal_code': 'postcode'}, inplace=True)
population_raw_data.head(50)avg_raw_data = pd.read_csv('data/avg_per_household.csv')
avg_raw_datapostcodes_raw_data = pd.read_csv('data/postal_codes_areas.csv')
postcodes_raw_data.rename(columns={'postal_code': 'postcode'}, inplace=True)
postcodes_raw_dataData Cleaning
population_raw_data.isnull().sum()avg_raw_data.isnull().sum()postcodes_raw_data.isnull().sum()population_raw_data.info()
print("\n")
avg_raw_data.info()
print("\n")
postcodes_raw_data.info()from sklearn.impute import SimpleImputer
# Create an instance of SimpleImputer
imputer = SimpleImputer(strategy='mean')
# Fit the imputer on the data and transform the data
postcodes_raw_data["population"] = imputer.fit_transform(postcodes_raw_data[["population"]])
postcodes_raw_data["num_households"] = imputer.fit_transform(postcodes_raw_data[["num_households"]])
# Drop the empty columns
postcodes_raw_data.dropna( inplace = True)
# Convert the columns to the appropriate dtype
# Remove commas from the column and convert it to float64
population_raw_data["estimated_cat_population"] = population_raw_data["estimated_cat_population"].str.replace(",", "").astype("float64")
population_raw_data["estimated_dog_population"] = population_raw_data["estimated_dog_population"].str.replace(",", "").astype("float64")โ
โ
โ
โ
โ