Skip to content
Cat vs. dog popularity in the U.K.
  • AI Chat
  • Code
  • Report
  • Cats vs Dogs: The Great Pet Debate 🐱🐶

    📖 Background

    You and your friend have debated for years whether cats or dogs make more popular pets. You finally decide to settle the score by analyzing pet data across different regions of the UK. Your friend found data on estimated pet populations, average pets per household, and geographic factors across UK postal code areas. It's time to dig into the numbers and settle the cat vs. dog debate!

    💾 The data

    There are three data files, which contains the data as follows below.

    The population_per_postal_code.csv data contains these columns:
    ColumnDescription
    postal_codeAn identifier for each postal code area
    estimated_cat_populationThe estimated cat population for the postal code area
    estimated_dog_populationThe estimated cat population for the postal code area
    The avg_per_household.csv data contains these columns:
    ColumnDescription
    postal_codeAn identifier for each postal code area
    cats_per_householdThe average number of cats per household in the postal code area
    dog_per_householdThe average number of dogs per household in the postal code area
    The postal_code_areas.csv data contains these columns:
    ColumnDescription
    postal_codeAn identifier for each postal code area
    townThe town/towns which are contained in the postal code area
    countyThe UK county that the postal code area is located in
    populationThe population of people in each postal code area
    num_householdsThe number of households in each postal code area
    uk_regionThe region in the UK which the postal code is located in

    *Acknowledgments: Data has been assembled and modified from two different sources: Animal and Plant Health Agency, Postcodes.

    💪 Challenge

    Leverage the pet data to analyze and compare cat vs. dog rates across different regions of the UK. Your goal is to identify factors associated with higher cat or dog popularity.

    Some examples:

    • Examine if pet preferences correlate to estimated pet populations, or geographic regions. Create visualizations to present your findings.
    • Develop an accessible summary of study findings on factors linked to cat and dog ownership rates for non-technical audiences.
    • See if you can identify any regional trends; which areas prefer cats vs. dogs?
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import statsmodels.api as sm
    
    sns.set_style('whitegrid')
    
    pd.set_option('display.float_format', '{:,.2f}'.format)
    
    population_raw_data = pd.read_csv('data/population_per_postal_code.csv')
    avg_raw_data = pd.read_csv('data/avg_per_household.csv')
    postcodes_raw_data = pd.read_csv('data/postal_codes_areas.csv')
    # Data preprocessing
    population_raw_data['estimated_dog_population'] = population_raw_data['estimated_dog_population'].str.replace(',','').astype(float)
    population_raw_data['estimated_cat_population'] = population_raw_data['estimated_cat_population'].str.replace(',','').astype(float)

    Q1. Examine if pet preferences correlate to estimated pet populations, or geographic regions. Create visualizations to present your findings.

    merge = pd.merge(population_raw_data, avg_raw_data, left_on = 'postal_code', right_on = 'postcode').drop('postcode', axis = 1)
    merge[['estimated_cat_population','cats_per_household']].corr()
    merge[['estimated_dog_population','dogs_per_household']].corr()
    df = pd.merge(merge, postcodes_raw_data, on = 'postal_code', how = 'left')
    plt.figure(figsize = (16,6))
    
    plt.subplot(1, 2, 1)
    sns.regplot(data = df, x = 'estimated_cat_population', y = 'cats_per_household', line_kws={'color':'red'})
    ax = plt.title("Estimated cat population vs. cats per household")
    
    plt.subplot(1, 2, 2)
    sns.regplot(data = df, x = 'estimated_dog_population', y = 'dogs_per_household', line_kws={'color':'red'})
    plt.title("Estimated dog population vs. dogs per household")
    
    plt.tight_layout()

    Q2. Identifying Regional Trends

    df['preference'] = df[['cats_per_household','dogs_per_household']].apply(lambda row: 'cats' if row[0] > row[1] else 'dogs', axis = 1)