Project: What is Your Heart Rate Telling You?

Millions of people develop some sort of heart disease every year, and heart disease is the biggest killer of both men and women in the United States and around the world. Statistical analysis has identified many risk factors associated with heart disease, such as age, blood pressure, total cholesterol, diabetes, hypertension, family history of heart disease, obesity, lack of physical exercise, and more.

In this project, you will run statistical tests and models using the Cleveland heart disease dataset to assess one particular factor -- the maximum heart rate one can achieve during exercise and how it is associated with a higher likelihood of getting heart disease.

Examining how heart rate responds to exercise along with other factors such as age, gender, the maximum heart rate achieved may reveal abnormalities that could be indicative of heart disease. Let's find out more!

The Data

Available on Cleveland_hd.csv

Column	Type	Description
`age`	continuous	age in years
`sex`	discrete	0=female 1=male
`cp`	discrete	chest pain type: 1=typical angina, 2=atypical angina, 3=non-anginal pain, 4=asymptom
`trestbps`	continuous	resting blood pressure (in mm Hg)
`chol`	continuous	serum cholesterol in mg/dl
`fbs`	discrete	fasting blood sugar>120 mg/dl: 1=true 0=False
`restecg`	discrete	result of electrocardiogram while at rest are represented in 3 distinct values 0=Normal 1=having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) 2=showing probable or definite left ventricular hypertrophy Estes' criteria (Nominal)
`thalach`	continuous	maximum heart rate achieved
`exang`	discrete	exercise induced angina: 1=yes 0=no
`oldpeak`	continuous	depression induced by exercise relative to rest
`slope`	discrete	the slope of the peak exercise segment: 1=up sloping 2=flat, 3=down sloping
`ca`	continuous	number of major vessels colored by fluoroscopy that ranged between 0 and 3
`thal`	discrete	3=normal 6=fixed defect 7=reversible defect
`class`	discrete	diagnosis classes: 0=no presence 1=minor indicators for heart disease 2=>1 3=>2 4=major indicators for heart disease

# Load the necessary packages
install.packages("Metrics")
library(tidyverse)
library(yardstick)
library(Metrics)
library(dplyr)
library(ggplot2)

# Load the data
hd_data <- read.csv("Cleveland_hd.csv")

# Inspect the first five rows
head(hd_data, 5)

# Start coding here...add as many cells as you like!
hd_data <- hd_data %>%
  mutate(binary_class = ifelse(class == 0, 0, 1))

contingency_table <- table(hd_data$sex, hd_data$binary_class)

chi_test <- chisq.test(contingency_table)
chi_test

t_test_age <- t.test(age ~ binary_class, data = hd_data)
t_test_age

t_test_chol <- t.test(chol ~ binary_class, data = hd_data)
t_test_chol

t_test_thalach <- t.test(thalach ~ binary_class, data = hd_data)
t_test_thalach

# Identify highly significant variables with p-value < 0.05
highly_significant <- list()

if (chi_test$p.value < 0.05) {
  highly_significant <- c(highly_significant, "Sex")
}

if (t_test_age$p.value < 0.05) {
  highly_significant <- c(highly_significant, "Age")
}

if (t_test_chol$p.value < 0.05) {
  highly_significant <- c(highly_significant, "Cholesterol")
}

if (t_test_thalach$p.value < 0.05) {
  highly_significant <- c(highly_significant, "Max Heart Rate")
}

highly_significant

# Visualizations
hd_data <- hd_data %>%
  mutate(hd_labelled = ifelse(binary_class == 0, "No disease", "Disease"))

# sex
ggplot(data = hd_data, aes(x = hd_labelled, fill = sex)) + geom_bar(position = "fill") + ylab("Sex %")

# age
ggplot(data = hd_data, aes(x = hd_labelled, y = age)) + geom_boxplot()

# thalach
ggplot(data = hd_data, aes(x = hd_labelled, y = thalach)) + geom_boxplot()

heart_disease_model <- glm(binary_class ~ sex + age + thalach, data = hd_data, family = "binomial")

summary(heart_disease_model)
predicted_probs <- predict(heart_disease_model, hd_data, type = "response")
binary_predictions <- ifelse(predicted_probs >= 0.5, 1, 0)

head(predicted_probs)
head(binary_predictions)

hd_data <- hd_data %>%
  mutate(predicted_class = binary_predictions)

head(hd_data)

# Correcting the accuracy calculation
accuracy <- accuracy(hd_data$binary_class, hd_data$predicted_class)
print(paste("Accuracy=", accuracy))

table(actual=hd_data$binary_class, prediction=hd_data$predicted_class)

# Convert binary_class and predicted_class to factors
hd_data <- hd_data %>%
  mutate(binary_class = as.factor(binary_class),
         predicted_class = as.factor(predicted_class))

confusion <- conf_mat(table(hd_data$binary_class, hd_data$predicted_class))

Project: What is Your Heart Rate Telling You?

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}The Data

The Data