The Anatomy of Baldness: Statistical & Machine Learning Perspectives on Key Factors of Hair Loss
0. Background
0.1. Introduction
Hair loss is a growing health concern that affects both appearance and overall well-being for many individuals. Gaining insights into the factors that contribute to its occurrence is essential for the development of personal health management, medical interventions, and related industries.
This report analyzes survey data on potential contributors to hair loss, with a focus on three specific objectives:
| Explore the characteristics of surveyed individuals. |
| Examine the relationships between hair loss and its factors. |
| Determine key predictors of hair loss through machine learning models. |
0.2. Data Used
The dataset contains a total of 999 rows of observations, each of which represents a single respondent. The table below provides a list of variables and their corresponding descriptions:
| Variable | Description |
|---|---|
| Id | A unique identifier for each person. |
| Genetics | Whether the person has a family history of baldness. |
| Hormonal Changes | Indicates whether the individual has experienced hormonal changes (Yes/No). |
| Medical Conditions | Medical history that may lead to baldness; alopecia areata, thyroid problems, scalp infections, psoriasis, dermatitis, etc. |
| Medications & Treatments | History of medications that may cause hair loss; chemotherapy, heart medications, antidepressants, steroids, etc. |
| Nutritional Deficiencies | Lists nutritional deficiencies that may contribute to hair loss, such as iron deficiency, vitamin D deficiency, biotin deficiency, omega-3 fatty acid deficiency, etc. |
| Stress | Indicates the stress level of the individual (Low/Moderate/High). |
| Age | Represents the age of the individual. |
| Poor Hair Care Habits | Indicates whether the individual practices poor hair care habits (Yes/No). |
| Environmental Factors | Indicates whether the individual is exposed to environmental factors that may contribute to hair loss (Yes/No). |
| Smoking | Indicates whether the individual smokes (Yes/No). |
| Weight Loss | Indicates whether the individual has experienced significant weight loss (Yes/No). |
| Hair Loss | Binary variable indicating the presence (1) or absence (0) of baldness in the individual. |
# Load pre-installed, required packages
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(scales))
# suppressPackageStartupMessages(library(readr))
suppressPackageStartupMessages(library(DT))
# Install and load the 'ggthemes' package
suppressWarnings(suppressMessages(install.packages("ggthemes", quite=TRUE)))
suppressPackageStartupMessages(library(ggthemes))
# Read the 'Predict Hair Fall' dataset
data <- read_csv("data/Predict Hair Fall.csv", show_col_types=FALSE)
# Encode data types and factors
data <- data %>%
mutate(Id = as.integer(Id),
Age = as.integer(Age),
across(where(is.character), as.factor),
Stress = factor(Stress, levels=c("Low", "Moderate", "High"), ordered=TRUE),
`Hair Loss` = as.factor(`Hair Loss`)
)
# Backup original dataset
data1 <- read_csv("data/Predict Hair Fall.csv", show_col_types=FALSE)
# Function to install missing packages
install_if_missing <- function(p) {
if (!require(p, character.only = TRUE)) {
suppressWarnings(suppressMessages(install.packages(p, dependencies = TRUE)))
suppressPackageStartupMessages(library(p, character.only = TRUE))
}
}
# Function to tabularize summary statistics of a given variable in the dataset
summarize_variable <- function(data, variable=NULL, caption=NULL, dom="t", searching=FALSE, rownames=FALSE, pageLength=10) {
if (is.null(variable)) {
datatable(data,
class="cell-border stripe",
options = list(dom=dom, searching=searching, pageLength=pageLength),
rownames=rownames,
caption=if (!is.null(caption)) htmltools::tags$caption(
style = 'caption-side: top; text-align: center;',
caption
) else NULL)
} else {
summarized_data <- data %>%
filter(!!sym(variable) != "No Data") %>%
group_by(!!sym(variable)) %>%
summarise(Count = n(), .groups="drop") %>%
{
if (is.ordered(data[[variable]])) {
arrange(., desc(!!sym(variable)))
} else {
arrange(., desc(Count))
}
} %>%
mutate(Percentage = label_percent(accuracy=0.01)(Count/sum(Count)))
datatable(summarized_data %>% rename(!!variable := !!sym(variable)),
class="cell-border stripe",
options = list(dom=dom, searching=searching, pageLength=pageLength),
rownames=rownames,
caption=if (!is.null(caption)) htmltools::tags$caption(
style = 'caption-side: top; text-align: center;',
caption
) else NULL)
}
}
# Function to check if a package is installed, and install it if not
install_if_missing <- function(package) {
if (!require(package, character.only = TRUE)) {
install.packages(package, dependencies = TRUE)
library(package, character.only = TRUE)
}
}
# Function to create a propotional stack bar plot (used for subplots of multiple variables)
create_plotly_stackbar <- function(data = data, var, showlegend=FALSE, yaxis_text=NULL, yaxis_titleSize=11.5, yaxis_tickSize=10, tickangle=0) {
yaxis_title <- ifelse(is.null(yaxis_text), var, yaxis_text)
plot_data <- as.data.frame(table(data %>% filter(!!sym(var) != "No Data") %>%
select(`Hair Loss`, all_of(var)))) %>%
rename_with(~ gsub("\\.", " ", .)) %>%
group_by(!!sym(var)) %>%
mutate(Percentage = 100*Freq/sum(Freq),
`Hair Loss` = ifelse(`Hair Loss` == 1, "With baldness", "Without baldness")
) %>%
rename(Count = Freq)
plot <- plot_ly(
data = plot_data,
x = ~Percentage,
y = as.formula(paste0("~`", var, "`")),
type = "bar",
color = ~`Hair Loss`,
colors = c("#FF6347", "#8c8c8c"),
showlegend = showlegend,
text = ~paste0("<b>Percentage:</b> ", label_percent(accuracy=0.01)(Percentage/100), "\n<b>Count:</b> ", Count),
hoverinfo = "text"
) %>%
layout(
xaxis = list(title = list(text="Percentage (%)", font = list(family="ITC Officina Sans", size=18))),
yaxis = list(title = list(text = paste("\n<b>", yaxis_title, "</b>"),
font = list(family="ITC Officina Sans", size = yaxis_titleSize),
standoff=43),
tickfont = list(family="ITC Officina Sans", size = yaxis_tickSize), tickangle=tickangle
),
barmode="stack",
legend = list(orientation="h", xanchor="center", x=0.438, y=1.11, traceorder="normal",
font = list(family="ITC Officina Sans", size=15)),
font = list(size = 14),
hoverlabel = list(font = list(family="ITC Officina Sans", size=15))
)
return(plot)
}
# Function to create a data frame for contingency table of a variable
create_contingency_data <- function(data, var) {
# Select relevant columns and filter out "No Data" values
selected_data <- data %>%
select(`Hair Loss`, all_of(var)) %>%
filter(!!sym(var) != "No Data")
# Create contingency table
contingency_data <- data.frame(matrix(
with(selected_data, table(selected_data[[var]], `Hair Loss`)),
nrow = dim(with(selected_data, table(selected_data[[var]], `Hair Loss`)))[1],
dimnames = dimnames(with(selected_data, table(selected_data[[var]], `Hair Loss`)))
))
return(contingency_data)
}
# Function to create stack bar plot for a variable
create_stackbar_plot1 <- function(data, var, yaxis_title, yaxis_text=NULL, plot_title=NULL, tickangle=45) {
stackbar_data <- data %>% filter(!!sym(var) != "No Data") %>%
mutate(!!sym(var) := factor(!!sym(var),
levels = rownames(create_contingency_data(data, var) %>%
mutate(X01 = X0+X1, X0 = X0/X01, X1 = X1/X01) %>%
arrange(X1, desc(rownames(.)))
)
))
plot <- create_plotly_stackbar(data = stackbar_data,
var = var,
showlegend=TRUE,
yaxis_text = yaxis_text,
yaxis_titleSize=20,
yaxis_tickSize=15,
tickangle = tickangle)
plot %>%
layout(
plot_bgcolor = "#D5E4EB",
paper_bgcolor = "#D5E4EB",
title = list(text = plot_title,
font = list(family = "ITC Officina Sans", size = 18),
y = 0.93),
margin = list(t = 95),
xaxis = list(tickfont = list(family = "ITC Officina Sans", size = 15))
)
}
# Function to create a contingency table of a variable
create_contingency_table <- function(data, var, first_colname=var) {
# Append the proportion to each value in the dataframe
data <- as.data.frame(
apply(data %>% arrange(desc(X1)), c(1, 2), function(x) {
percentage <- label_percent(accuracy=0.01)(x / sum(data))
sprintf("%d (%s)", x, percentage)
})
) %>%
rownames_to_column(var=first_colname)
# Rename rows and columns
colnames(data) <- c(first_colname, "Without baldness", "With baldness")
# Reorder columns
data <- data[,c(first_colname, "With baldness", "Without baldness")]
return(data)
}
# Function to compute Point-Biserial Correlation
compute_point_biserial <- function(data, x, y, conf=0.95) {
# Ensure the x and y are in the data
if (!(x %in% colnames(data)) | !(y %in% colnames(data))) {
stop("The variable x or y not found in the data.")
}
# Ensure the y contains only the specified values
unique_y <- unique(data[[y]])
if (length(unique_y) != 2) {
stop("The variable y is not binary.")
}
# Extract values for individuals with and without the binary outcome
with_group <- data[[x]][data[[y]] == 1]
without_group <- data[[x]][data[[y]] == 0]
# Calculate means
X1 <- mean(with_group)
X0 <- mean(without_group)
# Calculate standard deviation of the entire x data
sn <- sd(data[[x]])
# Calculate sample sizes
n1 <- length(with_group)
n0 <- length(without_group)
n <- n1 + n0
# Compute the point biserial correlation coefficient
rpb <- (X1 - X0) / sn * sqrt((n1 * n0) / (n^2))
# Compute confidence intervals using cor.test
rpb_cortest <- cor.test(data[[x]], data[[y]], conf.level=conf)
lower_ci <- rpb_cortest$conf.int[1]
upper_ci <- rpb_cortest$conf.int[2]
rpb_df <- data.frame(x=x, y=y, X1=X1, X0=X0, rpb=rpb, lower_ci=lower_ci, upper_ci=upper_ci
) %>%
mutate(across(where(is.numeric) & !where(is.integer), ~ ifelse(abs(.) >= 0.01,
label_number(accuracy = 0.01)(.),
label_scientific(digits = 3)(.)))
) %>%
rename(`Correlation coefficient` = rpb) %>%
mutate(!!paste0(conf * 100, "% CI") := paste0("[", lower_ci, ", ", upper_ci, "]"),
Method="Point-biserial") %>%
select(-c("lower_ci", "upper_ci"))
return(rpb_df)
}
# Function to compute Rank-Biserial Correlation
compute_rank_biserial <- function(data, x, y, o=0, l=1, conf=0.95) {
# Install and load the 'rcompanion' package if not already installed
if (!requireNamespace("rcompanion", quietly = TRUE)) {
suppressWarnings(suppressMessages(install.packages("rcompanion", quiet=TRUE)))
}
suppressPackageStartupMessages(library(rcompanion))
# Ensure the x and y are in the data
if (!(x %in% colnames(data)) | !(y %in% colnames(data))) {
stop("The variable x or y not found in the data.")
}
# Ensure the y contains only the specified values
unique_y <- unique(data[[y]])
if (length(unique_y) != 2) {
stop("The variable y is not binary.")
}
# Rank the predictor variable
data$ranked_factor <- rank(data[[x]], ties.method = "average")
with_group <- data$ranked_factor[data[[y]] == 1]
without_group <- data$ranked_factor[data[[y]] == 0]
# Calculate mean ranks for individuals with and without the binary outcome
R1 <- mean(with_group)
R0 <- mean(without_group)
# Compute outputs using the wilcoxonRG function
group <- factor(c(rep("With group", length(with_group)), rep("Without group", length(without_group))))
rrb_wilcoxonRG <- wilcoxonRG(x = c(with_group, without_group), g = group, ci=TRUE, conf=conf)
# Extract coefficient and confidence intervals
rrb <- rrb_wilcoxonRG[[1]]
lower_ci <- rrb_wilcoxonRG[[2]]
upper_ci <- rrb_wilcoxonRG[[3]]
rrb_df <- data.frame(x=x, y=y, R1=R1, R0=R0, rrb=rrb, lower_ci=lower_ci, upper_ci=upper_ci
) %>%
mutate(across(where(is.numeric) & !where(is.integer), ~ ifelse(abs(.) >= 0.01,
label_number(accuracy = 0.01)(.),
label_scientific(digits = 3)(.)))
) %>%
rename(`Correlation coefficient` = rrb) %>%
mutate(!!paste0(conf * 100, "% CI") := paste0("[", lower_ci, ", ", upper_ci, "]"),
Method="Rank-biserial") %>%
select(-c("lower_ci", "upper_ci"))
return(rrb_df)
}
# Function to compute Phi Coefficient
compute_phi <- function(data, x, y, o=0, l=1, u, t, conf=0.95) {
# Install and load the 'statpsych' package if not already installed
if (!requireNamespace("statpsych", quietly = TRUE)) {
suppressWarnings(suppressMessages(install.packages("statpsych", quiet=TRUE)))
}
suppressPackageStartupMessages(library(statpsych))
# Ensure the x and y are in the data
if (!(x %in% colnames(data)) | !(y %in% colnames(data))) {
stop("The variable x or y not found in the data.")
}
# Ensure the x and y contain only binary values
unique_x <- unique(data[[x]])
unique_y <- unique(data[[y]])
if (length(unique_x) != 2 | length(unique_y) != 2) {
stop("The variables x and y must be binary.")
}
# Create contingency table
contingency_table <- table(data[[x]], data[[y]])
# Convert o, l, u, t to character
o <- as.character(o)
l <- as.character(l)
u <- as.character(u)
t <- as.character(t)
# Extract counts from the contingency table using provided levels
a <- contingency_table[t, l]
b <- contingency_table[t, o]
c <- contingency_table[u, l]
d <- contingency_table[u, o]
# Compute outputs using the function ci.phi
ci_phi <- ci.phi(alpha = 1-conf, d, c, b, a)
# Compute confidence interval limits
phi <- ci_phi[[1]]
ci_lower <- ci_phi[[3]]
ci_upper <- ci_phi[[4]]
# Create a dataframe to return the results
phi_df <- data.frame(x = x, y = y, counts=paste(a,b,c,d, sep=", "), phi = phi,
ci = paste0("[", round(ci_lower, 2), ", ", round(ci_upper, 2), "]")) %>%
mutate(across(where(is.numeric) & !where(is.integer), ~ ifelse(abs(.) > 0.01,
label_number(accuracy = 0.01)(.),
label_scientific(digits = 3)(.)))
) %>%
rename(`a, b, c, d` = counts,
`Phi coefficient` = phi,
!!paste0(conf * 100, "% CI") := ci)
return(phi_df)
}
# Function to compute Cramer's V
compute_cramersV <- function(data, x, y, conf=0.95) {
# Ensure the x and y are in the data
if (!(x %in% colnames(data)) | !(y %in% colnames(data))) {
stop("The variable x or y not found in the data.")
}
# Create contingency table
contingency_table <- table(data[[x]], data[[y]])
# Compute chi-square statistic
chi2 <- chisq.test(contingency_table, simulate.p.value = TRUE)$statistic
# Get the number of rows and columns in the contingency table
n <- sum(contingency_table)
r <- nrow(contingency_table)
c <- ncol(contingency_table)
rc <- paste(r, c, sep=", ")
# Compute outputs using the cramerV function
cramersV <- cramerV(contingency_table, ci=TRUE, conf=conf)
# Extract coefficient and confidence intervals
cramers_v <- cramersV[[1]]
lower_ci <- cramersV[[2]]
upper_ci <- cramersV[[3]]
# Create a dataframe to return the results
cramers_v_df <- data.frame(
n = n,
rc = rc,
x = x,
y = y,
chi2 = chi2,
cramers_v = cramers_v,
lower_ci = lower_ci,
upper_ci = upper_ci
) %>%
mutate(across(
where(is.numeric) & !where(is.integer),
~ ifelse(abs(.) > 0.01, label_number(accuracy = 0.01)(.), label_scientific(digits = 3)(.))
)) %>%
rename(
`Chi-square statistic` = chi2,
`r, c` = rc,
`Cramer's V` = cramers_v
) %>%
mutate(!!paste0(conf * 100, "% CI") := paste0("[", lower_ci, ", ", upper_ci, "]")) %>%
select(-c("lower_ci", "upper_ci"))
rownames(cramers_v_df) <- c(1:nrow(cramers_v_df))
return(cramers_v_df)
}
# Function for evaluating model performance
evaluate_model <- function(model, test, y, model_name) {
# Check if pROC package is installed, if not, install it
if (!requireNamespace("pROC", quietly = TRUE)) {
install.packages("pROC")
}
suppressPackageStartupMessages(library(pROC))
# Perform predictions using the model and test data
prob <- predict(model, test, type = "prob")[, 2]
predicted <- predict(model, test)
# Create a confusion matrix to evaluate the model's performance
conf_matrix <- confusionMatrix(predicted, test[[y]])
# Compute model metrics
roc <- roc(test[[y]], prob)
auc <- auc(roc)
accuracy <- conf_matrix$overall['Accuracy']
precision <- conf_matrix$byClass['Pos Pred Value']
recall <- conf_matrix$byClass['Sensitivity']
f1 <- 2 * (precision * recall) / (precision + recall)
# Return performance metrics as a dataframe
results <- data.frame(
AUC = round(auc, 4),
Accuracy = round(accuracy, 4),
Precision = round(precision, 4),
Recall = round(recall, 4),
f1 = round(f1, 4)
) %>%
rename(`F1 Score` = f1)
rownames(results) <- model_name
return(results)
}
# Function for cluster statistics
cstats.table <- function(dist, tree, k) {
clust.assess <- c("cluster.number","n","within.cluster.ss","average.within","average.between",
"wb.ratio","dunn2","avg.silwidth")
clust.size <- c("cluster.size")
stats.names <- c()
row.clust <- c()
output.stats <- matrix(ncol = k, nrow = length(clust.assess))
cluster.sizes <- matrix(ncol = k, nrow = k)
for(i in c(1:k)){
row.clust[i] <- paste("Cluster-", i, " size")
}
for(i in c(2:k)){
stats.names[i] <- paste("Test", i-1)
for(j in seq_along(clust.assess)){
output.stats[j, i] <- unlist(cluster.stats(d = dist, clustering = cutree(tree, k = i))[clust.assess])[j]
}
for(d in 1:k) {
cluster.sizes[d, i] <- unlist(cluster.stats(d = dist, clustering = cutree(tree, k = i))[clust.size])[d]
dim(cluster.sizes[d, i]) <- c(length(cluster.sizes[i]), 1)
cluster.sizes[d, i]
}
}
output.stats.df <- data.frame(output.stats)
cluster.sizes <- data.frame(cluster.sizes)
cluster.sizes[is.na(cluster.sizes)] <- 0
rows.all <- c(clust.assess, row.clust)
output <- rbind(output.stats.df, cluster.sizes)[ ,-1]
colnames(output) <- stats.names[2:k]
rownames(output) <- rows.all
is.num <- sapply(output, is.numeric)
output[is.num] <- lapply(output[is.num], round, 2)
output
}
# Tabularize first five rows of the dataset
summarize_variable(slice(data, 1:5), dom="t") 0.3. Executive Summary
- The surveyed individuals are, on average, 34 years old, with their ages evenly distributed between 18 to 50 years.
- Alopecia areata emerges as the most common medical condition associated with baldness, consistent with its established medical background. Other notable conditions include psoriasis, thyroid problems, androgenetic alopecia, and dermatitis, each reported by approximately one-third of the 889 individuals with documented medical conditions.
- Among the 919 respondents with data on nutritional deficiencies, zinc and vitamin D deficiencies are the most prevalent. Biotin, omega-3 fatty acids, and vitamin A deficiencies also rank in the top five, each reported by at least 10% of these respondents.
- Middle-aged individuals, aged 31–45 years, constitute the largest proportion of those who have experienced baldness.
- Hair loss exhibits near-zero associations with all factors, indicating that it results from a combination of multiple factors rather than a single one.
- Stress levels—categorized as low, moderate, and high—are found to be evenly distributed among individuals with and without baldness.
- Among the three classification models evaluated, a random forest model, which demonstrates an initial AUC score of ~0.58 and an accuracy of ~0.59, is recommended for further tuning and optimization.
- Cluster analysis identifies two groups of individuals with shared key underlying features related to hair loss, including scalp infection, psoriasis, androgenetic alopecia, hormonal changes, and the use of immunomodulators.
- Through the importance analysis of the model, stress, age, and smoking are identified as major predictors of hair loss, while several features related to lifestyle and medical history are found to have considerable influence on the model's predictive performance.
1. Descriptive Analysis
This section provides an overview of the characteristics of the surveyed individuals using summary statistics, focusing on age, medical history, stress, and lifestyle factors.
1.1. Age
- Range: The youngest individual in the survey is 18 years old, and the oldest is 50 years old.
- 1st Quartile (Q1): Twenty-five percent of respondents are younger than 26 years.
- Mean: The average age of the respondents is 34.19 years.
- Median (Q2): Half of the individuals are younger (or older) than 34 years. This value is nearly the same as the mean, indicating a relatively symmetrical distribution of ages among the respondents.
- 3rd Quartile (Q3): Seventy-five percent of the surveyed individuals are younger than 42 years.
- Standard Deviation: 9.38 years, indicating a moderate spread of ages around the mean.
- Skewness: -0.03, indicating a nearly symmetrical age distribution with no significant skew toward younger or older groups.
# Install and load the 'moments' package for calculating skewness
suppressWarnings(suppressMessages(install.packages("moments", quiet=TRUE)))
suppressPackageStartupMessages(library(moments))
# Summary statistics of the age
age_stats <- data.frame(Value = unclass(summary(data$Age))) %>%
bind_rows(data.frame(Value = c(sd(data$Age), skewness(data$Age)), row.names = c("Std. Dev.", "Sk\u209a"))) %>%
mutate_if(is.numeric, round, 2)
# Age distribution plot
age_dist_plot <- ggplot(data, aes(x = Age, text = paste0("<b>Count:</b> ", after_stat(count)))) +
geom_histogram(breaks = hist(data$Age, breaks="FD", plot = FALSE)$breaks) +
geom_vline(aes(xintercept = mean(Age), text = paste0("<b>Mean: </b>", round(mean(Age), 2))), col="#FFA53E", linewidth=0.5) +
geom_vline(aes(xintercept = median(Age), text = paste0("<b>Median: </b>", round(median(Age), 2))), col="#4C6DB4", linewidth=0.5) +
theme_economist() +
scale_color_economist() +
scale_x_continuous(breaks = seq(15, 50, by=5)) +
ggtitle("\nFIG. 1﹕HISTOGRAM OF THE AGE DISTRIBUTION OF SURVEYED INDIVIDUALS") +
theme(
plot.title = element_text(size=13, hjust=0.5),
axis.title.x = element_text(family="ITC Officina Sans", size=13),
axis.title.y = element_text(family="ITC Officina Sans", size=13)
) +
labs(y="Count")
# Convert ggplot to plotly
age_dist_plotly <- ggplotly(age_dist_plot, tooltip=c("text")) %>%
layout(
hoverlabel = list(font = list(family="ITC Officina Sans", size=15)),
xaxis = list(tickfont = list(family="ITC Officina Sans", size=13)),
yaxis = list(tickfont = list(family="ITC Officina Sans", size=13))
) %>%
add_annotations(
text = paste0(
"<b>Std. Deviation (<i>\u03c3</i>): </b>", round(sd(data$Age), 2),
"; <b>Skewness (<i>Sk\u209a</i>): </b>", round(skewness(data$Age), 2)
),
x = 0.037, y = 0.97, xref = "paper", yref = "paper",
showarrow = FALSE, font = list(family="ITC Officina Sans", size=13.5), align="left"
)1.2. Medical History
This report defines an individual’s medical history as their medical background and experiences, particularly those related to hair loss. The following statistics highlight key characteristics of the respondents' medical histories:
- Genetics: A total of 52.25% of respondents reported having a family history of baldness.
- Hormonal Changes: Approximately 51% of respondents have experienced hormonal changes.
- Medical Conditions: Among the 889 individuals with recorded medical conditions potentially leading to baldness, alopecia areata is the most common, affecting 12.04% of the group. It is followed by psoriasis (11.25%), thyroid problems (11.14%), and androgenetic alopecia (11.02%). Dermatitis (10.35%) also ranks highly, rounding out the top five conditions.
# Tabularize summary statistics of Hormonal Changes
summarize_variable(
data %>% rename(`Has experienced hormonal changes` = `Hormonal Changes`),
variable = "Has experienced hormonal changes",
rownames = FALSE
)# Tabularize summary statistics of Medical Conditions
summarize_variable(
data = data %>% rename(`Medical condition that may lead to baldness` = `Medical Conditions`),
variable = "Medical condition that may lead to baldness",
dom = "tip",
pageLength = 5
)- Medications & Treatments: Rogaine is the most commonly used medication or treatment associated with hair loss among the respondents, with 11.63% reporting its use. It is followed by antidepressants (11.03%), steroids (10.73%), heart medication (10.43%), and Accutane (10.23%).
- Nutritional Deficiencies: Most of the documented 919 individuals with nutritional deficiencies that may contribute to hair loss are zinc deficient (11.75%). Vitamin D deficiency follows closely at 11.32%, while both Biotin and vitamin A deficiencies impact 10.77% each. The deficiency of omega-3 fatty acids completes the top five, affecting 10.01% of the group.
1.3. Stress
Stress levels among respondents are found to be almost evenly distributed, with 32.73% reporting low stress, 35.14% experiencing moderate stress, and 32.13% dealing with high stress.
1.4. Lifestyle
In this report, lifestyle factors are defined as various habits and behaviors that may influence hair loss. The statistics below provide an overview of key lifestyle traits among the respondents:
- Poor Hair Care Habits: Nearly half (49.25%) of the respondents reported engaging in poor hair care practices.
- Environmental Factors: A total of 50.85% indicated exposure to environmental factors associated with hair loss.
- Smoking: Approximately 52% of the respondents are identified as smokers.
- Weight Loss: Significant weight loss is reported by 47.25% of the respondents.
2. Measures of Association
2 hidden cells