Skip to content
0

How does income influence food choices? πŸ₯—πŸ’°

πŸ“– Background

Does eating healthy depend on what’s in your wallet? While some believe nutritious food is a luxury reserved for those who can afford it, others argue that education, accessibility, and policy interventions play an even bigger role.

As part of a public health research team, your mission is to uncover the real factors driving food choices. Are healthier foods truly more expensive, or do regional access, income distribution, and availability have a greater impact?

Your insights could help shape smarter food policies, making healthy eating more affordable and accessible for all. Are you ready to dig into the data and make a real-world impact?

πŸ’Ύ The data

Your team gathered three datasets to analyze the relationship between income levels and food choices:

Income-Expenditure

  • Mthly_HH_Income – Monthly household income
  • Mthly_HH_Expense – Total monthly household expenses
  • No_of_Fly_Members – Number of family members
  • Emi_or_Rent_Amt – Rent or loan payments
  • Annual_HH_Income – Annual household income
  • Highest_Qualified_Member – Education level of the most qualified household member
  • No_of_Earning_Members – Number of income earners in the household

Dietary Habits Survey Data

  • Age – Age group of the respondent
  • Gender – Male/Female
  • Dietary Preference – Vegetarian, Non-Vegetarian, Vegan, etc.
  • Meal Frequency – How often certain food types are consumed
  • Food Restrictions – Allergies and dietary restrictions
  • Beverage Intake – Hydration and drink preferences

Food Prices

  • Year – Year of data collection
  • Month – Month of data collection
  • Metroregion_code – Geographic area code
  • EFPG_code – Food category (e.g., whole grains, processed foods)
  • Attribute – Type of data recorded (e.g., price, purchase amount)
  • Value – Numeric value of the recorded attribute
import pandas as pd
import numpy as np

# Visualization
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns


# Machine Learning (for predictive modeling)
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
Dietary_Data = pd.read_csv("data/Dietary Habits Survey Data.csv")
Dietary_Data.head()
Food_price = pd.read_csv("data/Food Prices.csv")
Food_price.head()
food_priceDim = pd.read_csv("data/Food_Prices_Dimension_Table.csv")
food_priceDim.head()
IncomeExpenditure = pd.read_csv("data/Income-Expenditure.csv")
IncomeExpenditure.head()
IncomeExpenditure.info()

Descriptive statistics:

IncomeExpenditure.describe()
Avarage_IncomeByMonth = IncomeExpenditure["Mthly_HH_Income"].mean()
Std_IncomeByMonth = IncomeExpenditure["Mthly_HH_Income"].std()

print(f"Avarage income by month is {Avarage_IncomeByMonth}")
print(f"Std income by month is {Std_IncomeByMonth}")

πŸ’ͺ Challenge

Your public health research team has been asked to advise policymakers on the key factors influencing food choices across different income groups.

Your tasks are to analyze:

  1. Income & Food Affordability – How does household income relate to the affordability of different food categories?
    • Use the Income-Expenditure Dataset to analyze household income and overall expenses.
    • The Food Prices Dataset reveals how food costs vary by region, helping assess affordability.
  2. Healthy vs. Unhealthy Purchases – Do higher-income households buy healthier foods?
    • The Dietary Habits Survey captures individual consumption patterns.
    • The Food Prices Dataset helps assess whether healthier foods are more expensive.
  3. Regional Patterns – Are there geographic trends in food affordability?
    • The Food Prices Dataset includes location-based pricing data.
  4. Data Visualization – Create at least one chart to highlight key insights.
  5. [Optional] Nutritional Value vs. Cost – Are healthier foods more expensive than processed options?
    • Use the Food Prices Dataset and its dimension table to categorize food types and analyze price differences between healthy and unhealthy options.

At the end of your analysis, summarize your findingsβ€”what trends stand out, and what factors should policymakers target for intervention?

πŸ§‘β€βš–οΈ Judging criteria: Your Vote, Your Winners!

This is a community-driven competition, your votes decide the winners! Once the competition ends, you'll get to explore submissions, celebrate the best insights, and vote for your favorites. The top 5 most upvoted entries will win exclusive DataCamp merchandise - so bring your A-game, impress your peers, and claim your spot at the top!

βœ… Checklist before publishing

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
  • Check that all the cells run without error.

⏳ The table is setβ€”let’s serve up some data-driven insights! πŸ₯—πŸ“ŠπŸš€