How does income influence food choices? π₯π°
π Background
Does eating healthy depend on whatβs in your wallet? While some believe nutritious food is a luxury reserved for those who can afford it, others argue that education, accessibility, and policy interventions play an even bigger role.
As part of a public health research team, your mission is to uncover the real factors driving food choices. Are healthier foods truly more expensive, or do regional access, income distribution, and availability have a greater impact?
Your insights could help shape smarter food policies, making healthy eating more affordable and accessible for all. Are you ready to dig into the data and make a real-world impact?
πΎ The data
Your team gathered three datasets to analyze the relationship between income levels and food choices:
Income-Expenditure
Mthly_HH_Incomeβ Monthly household incomeMthly_HH_Expenseβ Total monthly household expensesNo_of_Fly_Membersβ Number of family membersEmi_or_Rent_Amtβ Rent or loan paymentsAnnual_HH_Incomeβ Annual household incomeHighest_Qualified_Memberβ Education level of the most qualified household memberNo_of_Earning_Membersβ Number of income earners in the household
Dietary Habits Survey Data
Ageβ Age group of the respondentGenderβ Male/FemaleDietary Preferenceβ Vegetarian, Non-Vegetarian, Vegan, etc.Meal Frequencyβ How often certain food types are consumedFood Restrictionsβ Allergies and dietary restrictionsBeverage Intakeβ Hydration and drink preferences
Food Prices
Yearβ Year of data collectionMonthβ Month of data collectionMetroregion_codeβ Geographic area codeEFPG_codeβ Food category (e.g., whole grains, processed foods)Attributeβ Type of data recorded (e.g., price, purchase amount)Valueβ Numeric value of the recorded attribute
import pandas as pd
import numpy as np
# Visualization
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns
# Machine Learning (for predictive modeling)
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import LabelEncoder, OneHotEncoderDietary_Data = pd.read_csv("data/Dietary Habits Survey Data.csv")
Dietary_Data.head()Food_price = pd.read_csv("data/Food Prices.csv")
Food_price.head()food_priceDim = pd.read_csv("data/Food_Prices_Dimension_Table.csv")
food_priceDim.head()IncomeExpenditure = pd.read_csv("data/Income-Expenditure.csv")
IncomeExpenditure.head()IncomeExpenditure.info()Descriptive statistics:
IncomeExpenditure.describe()Avarage_IncomeByMonth = IncomeExpenditure["Mthly_HH_Income"].mean()
Std_IncomeByMonth = IncomeExpenditure["Mthly_HH_Income"].std()
print(f"Avarage income by month is {Avarage_IncomeByMonth}")
print(f"Std income by month is {Std_IncomeByMonth}")
πͺ Challenge
Your public health research team has been asked to advise policymakers on the key factors influencing food choices across different income groups.
Your tasks are to analyze:
- Income & Food Affordability β How does household income relate to the affordability of different food categories?
- Use the Income-Expenditure Dataset to analyze household income and overall expenses.
- The Food Prices Dataset reveals how food costs vary by region, helping assess affordability.
- Healthy vs. Unhealthy Purchases β Do higher-income households buy healthier foods?
- The Dietary Habits Survey captures individual consumption patterns.
- The Food Prices Dataset helps assess whether healthier foods are more expensive.
- Regional Patterns β Are there geographic trends in food affordability?
- The Food Prices Dataset includes location-based pricing data.
- Data Visualization β Create at least one chart to highlight key insights.
- [Optional] Nutritional Value vs. Cost β Are healthier foods more expensive than processed options?
- Use the Food Prices Dataset and its dimension table to categorize food types and analyze price differences between healthy and unhealthy options.
At the end of your analysis, summarize your findingsβwhat trends stand out, and what factors should policymakers target for intervention?
π§ββοΈ Judging criteria: Your Vote, Your Winners!
This is a community-driven competition, your votes decide the winners! Once the competition ends, you'll get to explore submissions, celebrate the best insights, and vote for your favorites. The top 5 most upvoted entries will win exclusive DataCamp merchandise - so bring your A-game, impress your peers, and claim your spot at the top!
β
Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.