Premium project

Health Survey Data Analysis of BMI

Analyze health survey data to determine how BMI is associated with physical activity and smoking.

Start Project
11 Tasks1,500 XP

Loved by learners at thousands of companies

Project Description

Surveys are often used to study health behavior and determine the risks of disease. Meanwhile, seemingly every day, news outlets publish a different "research says" article about how to lose weight (fast! with no effort at all!). In this project, you will use survey data of ~20k people sampled from the United States to explore health behaviors associated with lower Body Mass Index (BMI), a standardized measure of healthy weight and obesity. Surveys with complex designs use special statistical methods to incorporate sampling weights and design factors into the estimation and inference. Incorporating survey design methods, you will use multiple regression to handle confounders when testing whether physical activity is associated with lower BMI. This project will use [National Health and Nutrition Examination Survey (NHANES)]( data from ~20,000 participants surveyed in years 2009-2012 found in the [NHANES R package](

Project Tasks

  1. 1
    Survey of BMI and physical activity
  2. 2
    Visualize survey weight and strata variables
  3. 3
    Specify the survey design
  4. 4
    Subset the data
  5. 5
    Visualizing BMI
  6. 6
    Is BMI lower in physically active people?
  7. 7
    Could there be confounding by smoking? (part 1)
  8. 8
    Could there be confounding by smoking? (part 2)
  9. 9
    Add smoking in the mix
  10. 10
    Incorporate possible confounding in the model
  11. 11
    What does it all mean?
Data ManipulationProbability & StatisticsCase Studies
Jessica Minnier Headshot

Jessica Minnier

Assistant Professor of Biostatistics at Oregon Health & Science University
Jessica is an Assistant Professor of Biostatistics in the OHSU-PSU School of Public Health at Oregon Health & Science University. Her statistical research interests include risk prediction with high dimensional data sets and the analysis of genetic and other omics data. She is passionate about teaching R and programming, reproducible research, and open science.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA