Premium Project

Health Survey Data Analysis of BMI

Analyze health survey data to determine how BMI is associated with physical activity and smoking.

Start Project
  • 11 tasks
  • 909 participants
  • 1,500 XP

Project Description

Surveys are often used to study health behavior and determine the risks of disease. Meanwhile, seemingly every day, news outlets publish a different "research says" article about how to lose weight (fast! with no effort at all!). In this project, you will use survey data of ~20k people sampled from the United States to explore health behaviors associated with lower Body Mass Index (BMI), a standardized measure of healthy weight and obesity. Surveys with complex designs use special statistical methods to incorporate sampling weights and design factors into the estimation and inference. Incorporating survey design methods, you will use multiple regression to handle confounders when testing whether physical activity is associated with lower BMI.

You will apply the skills you learned in Analyzing Survey Data in R and Multiple and Logistic Regression, as well as apply many skills from Introduction to the Tidyverse, including summarizing data and visualizing with ggplot2.

This project will use National Health and Nutrition Examination Survey (NHANES) data from ~20,000 participants surveyed in years 2009-2012 found in the NHANES R package.

Project Tasks

  • 1Survey of BMI and physical activity
  • 2Visualize survey weight and strata variables
  • 3Specify the survey design
  • 4Subset the data
  • 5Visualizing BMI
  • 6Is BMI lower in physically active people?
  • 7Could there be confounding by smoking? (part 1)
  • 8Could there be confounding by smoking? (part 2)
  • 9Add smoking in the mix
  • 10Incorporate possible confounding in the model
  • 11What does it all mean?
Jessica Minnier

Assistant Professor of Biostatistics at Oregon Health & Science University

Jessica is an Assistant Professor of Biostatistics in the OHSU-PSU School of Public Health at Oregon Health & Science University. Her statistical research interests include risk prediction with high dimensional data sets and the analysis of genetic and other omics data. She is passionate about teaching R and programming, reproducible research, and open science.

See More


  • R LogoR
  • Topics

    Data ManipulationProbability & StatisticsCase Studies