Abstract
Diabetes is a chronic condition that affects how your body turns food into energy, primarily involving issues with insulin production or use. It can lead to serious health complications if not managed properly, but with the right care, people with diabetes can lead healthy lives.
Several factors increase the risk of developing diabetes, including genetics, age, obesity, physical inactivity, and an unhealthy diet (1,2). Investigating the underlying processes of these risk factors is crucial as it helps identify the biological mechanisms and pathways involved, which can lead to more effective prevention and treatment strategies (5,6). Understanding these processes can also aid in developing personalized interventions to reduce the risk and manage the condition more effectively.
This analysis explores these different risk factors by performing a combination of Principal Component Analysis (PCA) and generalised linear regression analysis to find the most important risk factors. The study furthermore provides an algorithm to predict the risk of any person to get diabetes.
(1) Diabetes - World Health Organization (WHO). https://www.who.int/news-room/fact-sheets/detail/diabetes.
(2) Was erhöht das Risiko für Diabetes Typ 2? - diabinfo. https://www.diabinfo.de/vorbeugen/diabetes/bin-ich-gefaehrdet/was-erhoeht-das-risiko-fuer-diabetes-typ-2.html.
(3) Causal factors underlying diabetes risk informed by Mendelian .... https://link.springer.com/article/10.1007/s00125-023-05879-7.
(4) Risk factors for type 2 diabetes mellitus: An exposure-wide ... - PLOS. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0194127.
(5) Diabetes Risk Factors | Diabetes | CDC - Centers for Disease Control .... https://www.cdc.gov/diabetes/risk-factors/index.html.
(6) Assess your risk of developing diabetes - Diabetes Canada. https://www.diabetes.ca/type-2-risks/risk-factors---assessments.
(7) Diabetes: Australian facts, Risk factors for diabetes. https://www.aihw.gov.au/reports/diabetes/diabetes/contents/diabetes-risk-factors.
(8) Diabetes Mellitus: Insights from Epidemiology, Biochemistry, Risk .... https://mdpi-res.com/d_attachment/diabetology/diabetology-02-00004/article_deploy/diabetology-02-00004.pdf?version=1618555452.
(9) undefined. https://doi.org/10.1371/journal.pone.0194127.
💪 Competition challenge
In this challenge, you will focus on the following key tasks:
- Determine the most important factors affecting the diabetes outcome.
- Create interactive plots to visualize the relationship between diabetes and the determined factors from the previous step.
- What's the risk of a person of Age 54, length 178 cm and weight 96 kg, and Glucose levels of 125 mg/dL getting diabetes?
options(warn = -1, message = -1)
# Install required packages if they are not already installed
if (!requireNamespace("sjPlot", quietly = TRUE)) {
install.packages("sjPlot")
}
if (!requireNamespace("performance", quietly = TRUE)) {
install.packages("performance")
}
if (!requireNamespace("ggbiplot", quietly = TRUE)) {
install.packages("ggbiplot")
}
if (!requireNamespace("ggrepel", quietly = TRUE)) {
install.packages("ggrepel")
}
if (!requireNamespace("see", quietly = TRUE)) {
install.packages("see")
}
if (!requireNamespace("ggplot2", quietly = TRUE)) {
install.packages("ggplot2")
}
if (!requireNamespace("readr", quietly = TRUE)) {
install.packages("readr")
}
if (!requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr")
}
# Load the libraries
suppressMessages(library(sjPlot))
suppressMessages(library(performance))
suppressMessages(library(ggbiplot))
suppressMessages(library(ggrepel))
suppressMessages(library(see))
suppressMessages(library(ggplot2))
suppressMessages(library(readr))
suppressMessages(library(dplyr))
# Load the data from the CSV file
data <- read_csv("data/diabetes.csv", show_col_types = FALSE)
# Display the first few rows of the DataFrame
head(data)
1. Descriptive statistics
diabetes_data <- data |>
janitor::clean_names() |>
mutate(outcome = factor(outcome))
options(repr.plot.width = 18, repr.plot.height = 18)
plot <- suppressWarnings(suppressMessages({
GGally::ggpairs(diabetes_data, ggplot2::aes(color = outcome, fill = outcome), binwidth = 10) +
scale_color_manual(values = c("darkorchid", "turquoise")) +
scale_fill_manual(values = c("darkorchid", "turquoise")) +
theme_minimal()
}))
plot
2. Machine learning techniques to identify significant predictors of Diabetes outcome
2.1 Reducing data dimensions through PCA
# Perform the data transformation and plotting
pca_result <- diabetes_data |>
dplyr::mutate_if(is.numeric, scale) |>
dplyr::select_if(is.numeric) |>
prcomp()
# Create a data frame for the PCA results
pca_data <- as.data.frame(pca_result$x)
pca_data$outcome <- diabetes_data$outcome
# Create the plot using ggplot2
p <- ggbiplot::ggbiplot(pca_result, groups = diabetes_data$outcome) +
ggplot2::scale_color_manual(values = c("darkorchid", "turquoise")) +
ggplot2::theme_minimal()
p
# Add arrows to the plot
loadings <- as.data.frame(pca_result$rotation)
loadings$variable <- rownames(loadings)
# Make the plot interactive using plotly
plotly::ggplotly(p)
print(pca_result)
2.2 Using PCA outcome to identify predictors in glms
model_data <- diabetes_data |>
dplyr::mutate_if(is.numeric, datawizard::standardize) |>
dplyr::mutate(outcome = factor(outcome))
model <- stats::glm(outcome ~ pregnancies + age + blood_pressure + glucose + bmi + diabetes_pedigree_function + insulin + skin_thickness, data = model_data, family = "binomial")
model2 <- stats::glm(outcome ~ age * pregnancies + blood_pressure * glucose + bmi + diabetes_pedigree_function^2, data = model_data, family = "binomial")
model3 <- stats::glm(outcome ~ age * glucose * bmi^2, data = model_data, family = "binomial")
model4 <- stats::glm(outcome ~ age + glucose * bmi, data = model_data, family = "binomial")
performance::performance(model)
performance::performance(model2)
performance::performance(model3)
performance::performance(model4)