Skip to content

Cats vs Dogs: The Great Pet Debate ๐Ÿฑ๐Ÿถ

๐Ÿ“– Background

You and your friend have debated for years whether cats or dogs make more popular pets. You finally decide to erson to le the score by analyzing pet data across different regions of the UK. Your friend found data on estimated pet populations, average pets per household, and geographic factors across UK postal code areas. It's time to dig into the numbers and settle the cat vs. dog debate!

๐Ÿ’พ The data

There are three data files, which contains the data as follows below.

The population_per_postal_code.csv data contains these columns:
ColumnDescription
postal_codeAn identifier for each postal code area
estimated_cat_populationThe estimated cat population for the postal code area
estimated_dog_populationThe estimated cat population for the postal code area
The avg_per_household.csv data contains these columns:
ColumnDescription
postal_codeAn identifier for each postal code area
cats_per_householdThe average number of cats per household in the postal code area
dog_per_householdThe average number of dogs per household in the postal code area
The postal_code_areas.csv data contains these columns:
ColumnDescription
postal_codeAn identifier for each postal code area
townThe town/towns which are contained in the postal code area
countyThe UK county that the postal code area is located in
populationThe population of people in each postal code area
num_householdsThe number of households in each postal code area
uk_regionThe region in the UK which the postal code is located in

*Acknowledgments: Data has been assembled and modified from two different sources: Animal and Plant Health Agency, Postcodes.

๐Ÿ’ช Challenge

Leverage the pet data to analyze and compare cat vs. dog rates across different regions of the UK. Your goal is to identify factors associated with higher cat or dog popularity.

Some examples:

  • Examine if pet preferences correlate to estimated pet populations, or geographic regions. Create visualizations to present your findings.
  • Develop an accessible summary of study findings on factors linked to cat and dog ownership rates for non-technical audiences.
  • See if you can identify any regional trends; which areas prefer cats vs. dogs?

โœ… Checklist before publishing into the competition

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your story.
  • Make sure the workbook reads well and explains how you found your insights.
  • Try to include an executive summary of your recommendations at the beginning.
  • Check that all the cells run without error.

Summary

  • No region in the UK prefers cats over dogs.
  • Scotland, Wales, and South West regions exhibit a strong preference for owning dogs over cats, while London has a slightly higher preference for dogs.
  • East Midlands has the highest average estimated cat population, while Scotland has the least.
  • Wales and South West regions have the highest average number of cats and dogs per household, with one cat or dog in every other two households. In contrast, London has the lowest average,with one cat in every six households and one dog in every five households.
  • Regions with higher population densities tend to have lower numbers of pets per capita and per household compared to regions with lower population densities.
  • On average, there is at least one cat or one dog in every three households in any region of the UK
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
dtype_pop = {'postal_code':'category'}
import pandas as pd
population_raw_data = pd.read_csv('data/population_per_postal_code.csv',dtype=dtype_pop)
print(population_raw_data)
raw_dtype = {'postcode':'category'}
avg_raw_data = pd.read_csv('data/avg_per_household.csv',dtype=raw_dtype)
print(avg_raw_data)
post_dtype ={'postal_code':'category','counnty':'category','uk_region':'category'}
postcodes_raw_data = pd.read_csv('data/postal_codes_areas.csv',dtype=post_dtype)
print(postcodes_raw_data)
len(postcodes_raw_data),len(avg_raw_data),len(population_raw_data)
postcodes_raw_data['postal_code'].nunique(),population_raw_data['postal_code'].nunique(),avg_raw_data['postcode'].nunique()
โ€Œ
โ€Œ
โ€Œ