How does income influence food choices? π₯π°
π Background
Does eating healthy depend on whatβs in your wallet? While some believe nutritious food is a luxury reserved for those who can afford it, others argue that education, accessibility, and policy interventions play an even bigger role.
As part of a public health research team, your mission is to uncover the real factors driving food choices. Are healthier foods truly more expensive, or do regional access, income distribution, and availability have a greater impact?
Your insights could help shape smarter food policies, making healthy eating more affordable and accessible for all. Are you ready to dig into the data and make a real-world impact?
πΎ The data
Your team gathered three datasets to analyze the relationship between income levels and food choices:
Income-Expenditure
Mthly_HH_Income
β Monthly household incomeMthly_HH_Expense
β Total monthly household expensesNo_of_Fly_Members
β Number of family membersEmi_or_Rent_Amt
β Rent or loan paymentsAnnual_HH_Income
β Annual household incomeHighest_Qualified_Member
β Education level of the most qualified household memberNo_of_Earning_Members
β Number of income earners in the household
Dietary Habits Survey Data
Age
β Age group of the respondentGender
β Male/FemaleDietary Preference
β Vegetarian, Non-Vegetarian, Vegan, etc.Meal Frequency
β How often certain food types are consumedFood Restrictions
β Allergies and dietary restrictionsBeverage Intake
β Hydration and drink preferences
Food Prices
Year
β Year of data collectionMonth
β Month of data collectionMetroregion_code
β Geographic area codeEFPG_code
β Food category (e.g., whole grains, processed foods)Attribute
β Type of data recorded (e.g., price, purchase amount)Value
β Numeric value of the recorded attribute
import pandas as pd
import numpy as np
# Visualization
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns
# Machine Learning (for predictive modeling)
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
Dietary_Data = pd.read_csv("data/Dietary Habits Survey Data.csv")
Dietary_Data.head()
Food_price = pd.read_csv("data/Food Prices.csv")
Food_price.head()
food_priceDim = pd.read_csv("data/Food_Prices_Dimension_Table.csv")
food_priceDim.head()
IncomeExpenditure = pd.read_csv("data/Income-Expenditure.csv")
IncomeExpenditure.head()
IncomeExpenditure.info()
Descriptive statistics:
IncomeExpenditure.describe()
Avarage_IncomeByMonth = IncomeExpenditure["Mthly_HH_Income"].mean()
Std_IncomeByMonth = IncomeExpenditure["Mthly_HH_Income"].std()
print(f"Avarage income by month is {Avarage_IncomeByMonth}")
print(f"Std income by month is {Std_IncomeByMonth}")
πͺ Challenge
Your public health research team has been asked to advise policymakers on the key factors influencing food choices across different income groups.
Your tasks are to analyze:
- Income & Food Affordability β How does household income relate to the affordability of different food categories?
- Use the Income-Expenditure Dataset to analyze household income and overall expenses.
- The Food Prices Dataset reveals how food costs vary by region, helping assess affordability.
- Healthy vs. Unhealthy Purchases β Do higher-income households buy healthier foods?
- The Dietary Habits Survey captures individual consumption patterns.
- The Food Prices Dataset helps assess whether healthier foods are more expensive.
- Regional Patterns β Are there geographic trends in food affordability?
- The Food Prices Dataset includes location-based pricing data.
- Data Visualization β Create at least one chart to highlight key insights.
- [Optional] Nutritional Value vs. Cost β Are healthier foods more expensive than processed options?
- Use the Food Prices Dataset and its dimension table to categorize food types and analyze price differences between healthy and unhealthy options.
At the end of your analysis, summarize your findingsβwhat trends stand out, and what factors should policymakers target for intervention?
π§ββοΈ Judging criteria: Your Vote, Your Winners!
This is a community-driven competition, your votes decide the winners! Once the competition ends, you'll get to explore submissions, celebrate the best insights, and vote for your favorites. The top 5 most upvoted entries will win exclusive DataCamp merchandise - so bring your A-game, impress your peers, and claim your spot at the top!
β
Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.