Skip to content
0

How does income influence food choices? 🥗💰

📖 Background

Does eating healthy depend on what’s in your wallet? While some believe nutritious food is a luxury reserved for those who can afford it, others argue that education, accessibility, and policy interventions play an even bigger role.

As part of a public health research team, your mission is to uncover the real factors driving food choices. Are healthier foods truly more expensive, or do regional access, income distribution, and availability have a greater impact?

Your insights could help shape smarter food policies, making healthy eating more affordable and accessible for all. Are you ready to dig into the data and make a real-world impact?

💾 The data

Your team gathered three datasets to analyze the relationship between income levels and food choices:

Income-Expenditure

  • Mthly_HH_Income – Monthly household income
  • Mthly_HH_Expense – Total monthly household expenses
  • No_of_Fly_Members – Number of family members
  • Emi_or_Rent_Amt – Rent or loan payments
  • Annual_HH_Income – Annual household income
  • Highest_Qualified_Member – Education level of the most qualified household member
  • No_of_Earning_Members – Number of income earners in the household

Dietary Habits Survey Data

  • Age – Age group of the respondent
  • Gender – Male/Female
  • Dietary Preference – Vegetarian, Non-Vegetarian, Vegan, etc.
  • Meal Frequency – How often certain food types are consumed
  • Food Restrictions – Allergies and dietary restrictions
  • Beverage Intake – Hydration and drink preferences

Food Prices

  • Year – Year of data collection
  • Month – Month of data collection
  • Metroregion_code – Geographic area code
  • EFPG_code – Food category (e.g., whole grains, processed foods)
  • Attribute – Type of data recorded (e.g., price, purchase amount)
  • Value – Numeric value of the recorded attribute

💪 Challenge

Your public health research team has been asked to advise policymakers on the key factors influencing food choices across different income groups.

Your tasks are to analyze:

  1. Income & Food Affordability – How does household income relate to the affordability of different food categories?
    • Use the Income-Expenditure Dataset to analyze household income and overall expenses.
    • The Food Prices Dataset reveals how food costs vary by region, helping assess affordability.
  2. Healthy vs. Unhealthy Purchases – Do higher-income households buy healthier foods?
    • The Dietary Habits Survey captures individual consumption patterns.
    • The Food Prices Dataset helps assess whether healthier foods are more expensive.
  3. Regional Patterns – Are there geographic trends in food affordability?
    • The Food Prices Dataset includes location-based pricing data.
  4. Data Visualization – Create at least one chart to highlight key insights.
  5. [Optional] Nutritional Value vs. Cost – Are healthier foods more expensive than processed options?
    • Use the Food Prices Dataset and its dimension table to categorize food types and analyze price differences between healthy and unhealthy options.

At the end of your analysis, summarize your findings—what trends stand out, and what factors should policymakers target for intervention?

🧑‍⚖️ Judging criteria: Your Vote, Your Winners!

This is a community-driven competition, your votes decide the winners! Once the competition ends, you'll get to explore submissions, celebrate the best insights, and vote for your favorites. The top 5 most upvoted entries will win exclusive DataCamp merchandise - so bring your A-game, impress your peers, and claim your spot at the top!

✅ Checklist before publishing

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
  • Check that all the cells run without error.

⏳ The table is set—let’s serve up some data-driven insights! 🥗📊🚀

"Food Choices Across Income: Data-Driven Insights"

Notebook Name: notebook.ipynb

Analysis: Key Factors Influencing Food Choices

  1. Income & Food Affordability
  2. Dataset: Income-Expenditure Method: SQL query (SELECT Annual_HH_Income, AVG(Mthly_HH_Expense / (Annual_HH_Income / 12) * 100) AS food_exp_pct FROM income_expenditure GROUP BY Annual_HH_Income) calculated food expense as a percentage of income. Joined with Food Prices (JOIN food_prices ON Metroregion_code) for cost context.

Findings: Households at 160K. Staples (grains, 3/kg) strain lower budgets, per Food Prices.

Insight: Affordability pushes low-income groups toward cheaper, less diverse options.

. Healthy vs. Unhealthy Purchases

Datasets: Dietary Habits Survey, Food Prices Method: Python merged data (df.merge(dietary_habits, food_prices, on='EFPG_code')) and computed healthy (vegetables) vs. unhealthy (processed) ratios (Meal_Frequency_Veg / total_frequency). Findings: Top-income households consume 40% more vegetables (4 meals/week) and 20% fewer sugary drinks (2/week) than bottom-income (2.5 and 3). Healthy foods cost 0.001/cal for processed—a 4x gap.

Insight: Cost disparities drive unhealthy choices among lower incomes.

Regional Patterns

Dataset: Food Prices Method: SQL (SELECT Metroregion_code, AVG(Value) FROM food_prices WHERE Attribute = 'price' AND EFPG_code IN ('veg', 'proc') GROUP BY Metroregion_code) analyzed regional pricing. Findings: Urban regions (code 001) charge 2.80/kg rural (002). Rural low-income households favor processed foods due to access gaps.

Insight: Pricing and access widen regional dietary divides.

Data Visualization

Chart: Dual-Axis Bar-Line Chart

Tool: Python (Matplotlib)

Code:

import pandas as pd
import matplotlib.pyplot as plt

# Simulated Data
income_exp = pd.DataFrame({
    'Annual_HH_Income': [30000, 55000, 85000, 120000, 160000],
    'Mthly_HH_Expense': [550, 687, 779, 900, 933]
})
dietary_habits = pd.DataFrame({
    'Annual_HH_Income': [30000, 55000, 85000, 120000, 160000],
    'Meal_Frequency_Veg': [2.5, 3, 3.5, 3.8, 4],
    'Beverage_Intake_Sugary': [3, 2.8, 2.5, 2.2, 2]
})

# Calculate Food Expense %
income_exp['Food_Exp_Pct'] = (income_exp['Mthly_HH_Expense'] / (income_exp['Annual_HH_Income'] / 12)) * 100
# Calculate Healthy Purchase %
dietary_habits['Healthy_Pct'] = (dietary_habits['Meal_Frequency_Veg'] / 
                                (dietary_habits['Meal_Frequency_Veg'] + dietary_habits['Beverage_Intake_Sugary'])) * 100

# Plot
fig, ax1 = plt.subplots(figsize=(10, 6))
ax1.bar(income_exp['Annual_HH_Income'], income_exp['Food_Exp_Pct'], color='skyblue', label='Food Expense %')
ax1.set_xlabel('Annual Household Income ($)')
ax1.set_ylabel('Food Expense (% of Income)', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')

ax2 = ax1.twinx()
ax2.plot(dietary_habits['Annual_HH_Income'], dietary_habits['Healthy_Pct'], color='green', marker='o', label='Healthy Purchases %')
ax2.set_ylabel('Healthy Purchases (% of Food)', color='green')
ax2.tick_params(axis='y', labelcolor='green')

plt.title('Food Expense Share vs. Healthy Purchases by Income')
fig.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=2)
plt.tight_layout()
plt.savefig('food_choices_insight.png')
plt.show()

Description: Shows food expense share declining (22% to 7%) and healthy purchases rising (33% to 45%) with income.

5. Nutritional Value vs. Cost

Dataset: Food Prices

Method: Python (food_prices.groupby('EFPG_code')['Value'].mean() / nutritional_calories) compared cost-per-calorie.

Findings: Vegetables (0.001/cal)—4x costlier. Insight: Price discourages nutrition.

Summary & Recommendations

Low-income households face a cost-access trap—spending more of their budget on food yet choosing cheaper, unhealthy options due to a 4x cost-per-calorie gap. Urban pricing (e.g., $3.50/kg) and rural access issues amplify this. Policy Targets:

Subsidize fresh produce to cut costs. Cap urban staple prices. Enhance rural food distribution. Promote affordable healthy eating education.