Competition - How Does Income Influence Food Choices? 🥗💰

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
sns.set(style="whitegrid")
pd.set_option('display.max_columns', None)
os.getcwd()
income_expense_df = pd.read_csv('data/Income-Expenditure.csv')
dietary_habits_df = pd.read_csv('data/Dietary Habits Survey Data.csv')
food_prices_df = pd.read_csv('data/Food Prices.csv')
food_prices_dim_df = pd.read_csv('data/Food_Prices_Dimension_Table.csv')
dietary_habits_df = dietary_habits_df.drop(columns=[' '])

Food Choices and Affordability Analysis

Public Health Research Project

This notebook explores how income, dietary preferences, and regional patterns influence food affordability.
The goal is to uncover insights that can inform public health policies aimed at making healthy eating more accessible.

1. Income & Food Affordability – How does household income relate to food affordability?

We begin by examining the relationship between household income levels and their ability to afford a variety of food categories.
This helps us understand how financial constraints impact food choices.

food_prices_df['Attribute'].value_counts()

Food Prices Dataset – Focus on Weighted Mean Unit Value

The food_prices_df dataset contains various attributes related to food sales, quantities, and pricing across regions and time periods.
For this analysis, we focus specifically on the attribute "Weighted mean unit value per 100 grams".

This attribute represents the average price of food items standardized per 100 grams, adjusted for regional sales volume weights.
It provides a fair comparison of food costs across different locations and categories, minimizing the impact of uneven sales distribution.

By filtering the dataset to only include this attribute, we ensure that our analysis focuses purely on comparable food price metrics across time and regions.

food_prices_df[food_prices_df['Attribute'] == 'Weighted mean unit value per 100 grams']

Food Price Summary per Food Category

After filtering the dataset to focus on the weighted mean unit price,
we calculate key statistical metrics for each food category (EFPG_code).

Specifically, we compute:

Minimum Price across regions
Maximum Price across regions
Mean (Average) Price across regions
Standard Deviation of Price across regions

Filtering Stable Food Items

We remove food categories with a price standard deviation above 0.1 to focus on items with more consistent pricing across regions.

food_prices_filtered = food_prices_df[food_prices_df['Attribute'] == 'Weighted mean unit value per 100 grams']
avg_food_prices = food_prices_filtered.groupby('EFPG_code')['Value'].mean().reset_index()
avg_food_prices.rename(columns={'Value': 'Average_Price_per_100g'}, inplace=True)
food_price_summary = food_prices_filtered.groupby('EFPG_code')['Value'].agg(
    Min_Price='min',
    Max_Price='max',
    Mean_Price='mean',
    Std_Price='std'
).reset_index()

food_price_summary = food_price_summary[food_price_summary['Std_Price'] < 0.1].reset_index(drop=True)
print("✅ Created summary DataFrame for each EFPG_code:")
display(food_price_summary)

Merging Food Names and Tier Groups

We merge food_price_summary with food names and Tier 1 groups for better readability and grouping.

Estimating Monthly Food Consumption

We define typical monthly consumption (in grams) for each major food group based on expected human intake.

Calculating Monthly Food Spending

We map the estimated consumption to each food item,
then calculate the expected monthly spending by multiplying average price by estimated consumption.

Missing values are filled with zero to avoid calculation errors.

food_price_summary = pd.merge(food_price_summary, food_prices_dim_df[['EFPG_code', 'EFPG_name', 'Tier 1 group']], 
                              on='EFPG_code', how='left')

# Define estimated monthly consumption in grams for your actual Tier 1 groups

consumption_mapping = {
    'Grains': 3000,
    'Vegetables': 3000,
    'Fruit': 2000,
    'Dairy': 2000,
    'Meat and Protein Foods': 1000,
    'Prepared meals, sides, and salads': 1000,
    'Other foods': 1000  # fallback: for processed foods/snacks
}
# Map Estimated Consumption per food item
food_price_summary['Estimated_Consumption_grams'] = food_price_summary['Tier 1 group'].map(consumption_mapping)

# If any values are missing (e.g., typos), you can fill them with 0 or a default
food_price_summary['Estimated_Consumption_grams'] = food_price_summary['Estimated_Consumption_grams'].fillna(0)

# Calculate Monthly Food Spending
food_price_summary['Monthly_Food_Spending'] = food_price_summary['Mean_Price'] * (food_price_summary['Estimated_Consumption_grams'] / 100)


food_price_summary['Total_Monthly_Spending'] = food_price_summary['Monthly_Food_Spending'].sum()

print("✅ Correctly updated Monthly Food Spending based on realistic consumption.")
display(food_price_summary[['Tier 1 group', 'Estimated_Consumption_grams', 'Monthly_Food_Spending']])

income_expense_df['Mthly_HH_Income'].describe()

Categorizing Household Income Levels

We use the distribution of Mthly_HH_Income to categorize households into three groups:

Low Income: below 10,000
Middle Income: 10,000 to 40,000
High Income: above 40,000

This grouping helps us analyze food affordability by income class.

Calculating Net Disposable Income

We calculate the net disposable income for each household by subtracting their total expenses and rent/loan payments from their monthly income.
This metric represents the money left for discretionary spending, including food choices.

# Corrected Step: Create Income Groups based on Mthly_HH_Income

income_bins = [0, 10000, 40000, np.inf]
income_labels = ['Low Income', 'Middle Income', 'High Income']

income_expense_df['Income_Group'] = pd.cut(income_expense_df['Mthly_HH_Income'],
                                           bins=income_bins,
                                           labels=income_labels)

income_expense_df['Net_Disposable_Income'] = income_expense_df['Mthly_HH_Income'] - (income_expense_df['Mthly_HH_Expense']+income_expense_df['Emi_or_Rent_Amt'])

print("✅ Net Disposable Income calculated.")
display(income_expense_df[['Mthly_HH_Income', 'Mthly_HH_Expense','Emi_or_Rent_Amt', 'Net_Disposable_Income']])

Cross-Joining Income and Food Data

We cross-join the income dataset with the food price summary to simulate each household's ability to afford different food categories.

Calculating Affordability Score

The affordability score is computed as: It shows how much of a household's available income would be needed to cover each food item, helping to assess affordability across income groups.

# Create keys for cross join
income_expense_df['key'] = 1
food_price_summary['key'] = 1

# Cross join
expanded_df = pd.merge(income_expense_df, food_price_summary, on='key').drop('key', axis=1)

# Step 4: Calculate Affordability Score
# Replace 0 with 0.1 in 'Net_Disposable_Income' to avoid division by zero
expanded_df['Net_Disposable_Income'] = expanded_df['Net_Disposable_Income'].replace(0, 0.1)

expanded_df['Affordability_Score'] = expanded_df['Monthly_Food_Spending'] / expanded_df['Net_Disposable_Income']

print("✅ Affordability Score calculated for each food and each household.")
display(expanded_df[['Income_Group', 'EFPG_name', 'Monthly_Food_Spending', 'Net_Disposable_Income', 'Affordability_Score']])

import seaborn as sns
import matplotlib.pyplot as plt

# Group again if needed
grouped = expanded_df.groupby(['Income_Group', 'Tier 1 group'])['Affordability_Score'].mean().reset_index()

# Create a FacetGrid
g = sns.FacetGrid(grouped, col="Income_Group", sharex=False, height=6, aspect=1.2)
g.map_dataframe(sns.barplot, x="Affordability_Score", y="Tier 1 group", palette="muted")
g.set_titles(col_template="{col_name}")
g.set_axis_labels("Affordability Score", "Food Category")
g.fig.subplots_adjust(top=0.85)
g.fig.suptitle('Average Food Affordability Score by Income Group', fontsize=16)
plt.show()

‌
‌
‌