Skip to content

EFFECTS OF PHYSICAL ACTIVITY ON MENTAL HEALTH

The main purpose of this project is to demonstrate my skills as a Professional Data Analyst. I used Python for this project. These skills include:

  • Data Cleaning
  • Data Wrangling
  • Data Visualization
  • Exploratory Data Analysis

About the Data

I chose to pull data from the CDC website, specifically the 2017-March 2020 Pre-Pandemic Questionnaire Data - Continuous NHANES (National Health and Nutrition Examination Survey) because Health and Nutrition are my main areas of expertise. The data and information on the data can be found here: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?Cycle=2017-2020

Some key notes about the data:

  • The pandemic caused some issue with data collection from March 2020. Because of this, post-pandemic data is currently under restricted access, hence me using prepandemic data in the first place.
  • The CDC used extensive sampling collection techniques to ensure that the full dataset would be nationally representative
  • From this national survey, I chose two datasets: the Mental Health - Depression Screener, and the Physical Activity Questionnaire
    • These two datasets are self reported surveys, not clinical studies. The self reported nature of the questionnaires does reflect in the Data Analysis, causing
    • some of the data to be skewed. This will be addressed during EDA and visualization.

Let's Get Started

We all know that exercise is good for our physical health in several ways. We have become experts at tracking the effects of exercise on our physical health. We measure muscle mass and BMI, we sample our blood and measure vitamins, minerals, enzymes, and toxins. We check heart rate and blood pressure and bone mass. And through physical exercise, we have learned how to improve these measurements in some way.

But what about mental health? In this project, I would like to explore the effects of physical exercise on mental health at various levels. Some questions I'd like to answer:

  • What is the effect of physical activity on mental health?
  • How much exercise do people get on average?
    • At work?
    • Recreationally?
  • Are there any mental health struggles that effect people more than others? What are they?
import numpy as np
import pandas as pd
import scipy.stats
import seaborn as sns
import matplotlib.pyplot as plt

demographic = pd.read_csv('CDC_2017_2020_Demographic_Data.csv')
depression = pd.read_csv('CDC_2017_2020_Depression_Screen.csv')
activity = pd.read_csv('CDC_2017_2020_Physical_Activity_Questionnaire.csv')
print(activity.head())
print(activity.info())
print(depression.head())
print(depression.info())

Physical Activity Questionnaire Data Cleaning

In this dataset, participants were asked various questions about total time doing physical activities in minutes, the intensity of said physical activity, and whether or not the activity was done at work or recreationally.

The names of these columns all have reference numbers, such as 'PAQ605', so the first step is to rename these columns to reflect the questions that were asked.

Participants were first asked whether or not they even performed moderate or vigorous activity at work and recreationally. If they answered 'No', questions pertaining to time spent doing said activity were skipped, resulting in a null value. These values can be replaced with a 0, since it is implied that not doing an activity means 0 time spent doing said activities.

If participants ansewered "Don't Know" to any of these questions, their answer would be assigned a numerical value, which will throw the numbers off. Since there are only 51 entries with this answer, we can filter them out without disrupting the data.

#Change the names of the columns we're going to use
activity.rename(columns={"PAQ605": "vigorous_work_activity", "PAQ610": "vigorous_work_days", 'PAD615':'vigorous_work_minutes', 'PAQ620': 'moderate_work_activity', 'PAQ625':'moderate_work_days', 'PAD630':'moderate_work_minutes','PAQ635':'walk_or_bicycle','PAQ640':'days_walk_or_bicycle','PAD645':'minutes_walk_or_bicycle','PAQ650':'vigorous_recreational_activities','PAQ655':'vigorous_recreational_days','PAD660':'vigorous_recreational_minutes','PAQ665':'moderate_recreational_activities','PAQ670':'moderate_recreational_days','PAD675':'moderate_recreational_minutes',  'PAD680':'sedentary_activity_minutes'}, inplace=True)

#replace null values with 0
activity.fillna(0, inplace=True)

# Entries categorized as "Don't Know" are given numeric values that misrepresent the sample. There are only 51 of these entries, so we will take them out
activity = activity[(activity['vigorous_work_days'] != 99) & 
                    (activity['moderate_work_days'] != 99) & 
                    (activity['days_walk_or_bicycle'] != 99) &
                    (activity['vigorous_work_minutes'] != 9999) &
                   (activity['moderate_work_minutes'] != 9999) &
                   (activity['minutes_walk_or_bicycle'] != 9999) &
                   (activity['vigorous_recreational_days'] !=99) &
                   (activity['moderate_recreational_days'] !=99) &
                   (activity['vigorous_recreational_minutes'] !=9999) &
                   (activity['moderate_recreational_minutes'] !=9999) &
                   (activity['sedentary_activity_minutes'] !=9999)]

Physical Activity Questionnaire Data Wrangling

To explore and visualize this data, we want to compare time spent doing different types of activity and where they do it on a weekly basis. To do this, we must calculate the total amount of minutes doing this activity by multiplying the minutes spent doing physical activity per day by the amount of days per week they do said activity.

Not All Activity is Created Equal

We must also account for the fact that vigorous activity and moderate activity have distinctly different effects on the body. For this, we can use a unit called METs, which stands for Metabolic Equivalent of Task. Essentially, the higher the METs, the more energy used. The questionnaire suggests classifying moderate activity as 4 METs per minute and vigorous activity as 8 METs per minute. Walking or riding a bicycle to commute to work is considered moderate activity, thus also achieving a MET score of 4.

It is worth mentioning that different activities can have various MET scores, not just 4 or 8. This means that the total METs will be a bit overgeneralized.

We can use total time and MET scores to calculate total physical activity for each participant.


# calculating time doing activity
activity['vigorous_work_time'] = activity['vigorous_work_days'] * activity['vigorous_work_minutes']
activity['moderate_work_time'] = activity['moderate_work_days'] * activity['moderate_work_minutes']
activity['vigorous_recreational_time'] = activity['vigorous_recreational_days'] * activity['vigorous_recreational_minutes']
activity['moderate_recreational_time'] = activity['moderate_recreational_days'] * activity['moderate_recreational_minutes']
activity['walk_or_bicycle_time'] = activity['days_walk_or_bicycle'] * activity['minutes_walk_or_bicycle']
activity['total_vigorous_time'] = activity['vigorous_work_time'] + activity['vigorous_recreational_time']
activity['total_moderate_time'] = activity['moderate_work_time'] + activity['moderate_recreational_time'] + activity['walk_or_bicycle_time']
activity['total_active_time'] = activity['total_vigorous_time'] + activity['total_moderate_time']
activity['total_sedentary_time'] = activity['sedentary_activity_minutes'] * 7
#calculating total METs for each type of activity
activity['total_work_METs'] = (activity['vigorous_work_time'] * 8) + (activity['moderate_work_time'] * 4)
activity['total_recreational_METs'] = (activity['vigorous_recreational_time'] * 8) + (activity['moderate_recreational_time'] * 4)
activity['total_walk_or_bike_METs'] = activity['walk_or_bicycle_time'] * 4
activity['total_vigorous_METs'] = (activity['vigorous_work_time'] + activity['moderate_work_time']) * 8
activity['total_sedentary_METs'] = activity['total_sedentary_time'] * 7
activity['total_moderate_METs'] = (activity['moderate_work_time'] + activity['moderate_recreational_time'] + activity['walk_or_bicycle_time']) * 4
activity['total_METs'] = activity['total_recreational_METs'] + activity['total_work_METs'] + activity['total_walk_or_bike_METs']
# comparing time doing vigorous vs moderate activity at work
vigorous_at_work = activity[(activity['vigorous_work_activity'] == 1) & (activity['moderate_work_activity'] == 2)]
moderate_at_work = activity[(activity['moderate_work_activity'] == 1) & (activity['vigorous_work_activity'] == 2)]
total_METs = activity['total_METs']

Active vs Sedentary

The barplot below shows a comparison of the average amount of time people spend active vs time spent sedentary. On average people spend 819 minutes per week getting exercise, while they spend 2373 minutes being sedentary.

active_vs_sedentary=activity[['total_active_time','total_sedentary_time']]
active_vs_sedentary_plot = sns.barplot(active_vs_sedentary, ci=None)
plt.title('Average Time Spent Active vs Sedentary')
plt.ylabel('Minutes per Week')
active_vs_sedentary_plot.set_xticklabels(['Active','Sedentary'])
plt.show()

Comparison of Exercise Times

The graph below shows a breakdown of how much time people spend on different types of activity on a typical week. Based on the graph, people get more exercise, both moderate and vigorous, at work than recreationally or during commute. People get most of their exercise from moderate work activity at around 355 minutes per week.

import seaborn as sns
import matplotlib.pyplot as plt

activity_comparison = activity[['vigorous_work_time','moderate_work_time', 'vigorous_recreational_time', 'moderate_recreational_time', 'walk_or_bicycle_time']]
time_spent = sns.barplot(data=activity_comparison, orient='h', ci=None)
time_spent.set_yticklabels(['Vigorous Work','Moderate Work','Vigorous Recreational','Moderate Recreational','Walk or Bicycle'])
plt.title('Average Time Spent Exercising')
plt.ylabel('Types of Exercise')
plt.xlabel('Minutes Per Week')
plt.show()

Work vs Recreational

The graph below further supports the idea that people get most of their exercise from work. When calculating the total METs people achieve, we can see that the total METs at work is much greater than recreationally, at 3337 and 884 respectively.

#comparing work METS vs recreational METs vs walk or bike METs
METs_comparison = activity[['total_work_METs', 'total_recreational_METs']]
METs_plot = sns.barplot(METs_comparison, orient = 'h', ci=None)
METs_plot.set_yticklabels(['Work', 'Recreational'])
plt.title('Average Total METs at Work vs Recreational')
plt.xlabel('METs Per Week')
plt.show()

Vigorous vs Moderate

The graph two graphs below show the comparison between vigorous activity vs moderate activity. What's interesting is that on average, people spend more time doing moderate activity in general, but the total amount of METs from vigorous activity is much greater than moderate activity. This supports the idea that time spent doing vigorous activity is more valuable than time spent doing moderate activity.

moderate_vs_vigorous = activity[['total_vigorous_time','total_moderate_time']]
moderate_vs_vigorous_plot=sns.barplot(moderate_vs_vigorous, orient = 'h', ci=None)
moderate_vs_vigorous_plot.set_yticklabels(['Vigorous', 'Moderate'])
plt.title('Average Vigorous vs Moderate Time Spent Exercising')
plt.xlabel('Minutes Per Week')
plt.show()