Skip to content

You are a data analyst for a United Nations initiative focused on understanding global health trends. Your latest assignment is to explore and visualize life expectancy data from around the world, focusing on gender differences.

Life expectancy can vary significantly over time and across different countries due to numerous factors, including advancements in medicine, a country's development level, and the impacts of conflicts. Interestingly, data consistently shows that women tend to live longer than men, raising intriguing questions. Could this be due to biological factors or perhaps because women generally care for their health better?

Your task is to explore these patterns and disparities.

The Data

The dataset contains information about life expectancy in various countries or areas, broken down by gender and time periods. The data is sourced from the United Nations Population Division, Gender Statistics, Life Expectancy at Birth.

UNdata.csv
ColumnMeaning
Country.or.AreaThe name of the country or region being described.
SubgroupThe specific subgroup within the country or area (e.g., Female, Male).
YearThe time period for the data provided (e.g., 2000-2005).
SourceThe source of the data, specifying the UN publication or report where the data originated.
UnitThe unit of measurement for life expectancy.
ValueThe measured value for the life expectancy in the specified country, subgroup, and time period.
Value.FootnotesAdditional notes or comments related to the value, if any.
library(dplyr)
library(tidyr)
library(ggplot2)

life_expectancy=read.csv("datasets/UNdata.csv")
head(life_expectancy)
# Check for missing data 
missing <- any(is.na(life_expectancy$Value))
missing
# Filter for 2000–2005
life_expectancy_2000_2005 <- life_expectancy %>%
  filter(Year == "2000-2005", Subgroup %in% c("Female", "Male"))

head(life_expectancy_2000_2005)

# Calculate average life expectancy by gender
avg_life_exp_by_gender <- life_expectancy_2000_2005 %>%
  group_by(Subgroup) %>%
  summarise(mean_value = mean(Value, na.rm = TRUE))

summary(avg_life_exp_by_gender)

# Determine which gender has higher life expectancy globally
subgroup <- ifelse(
  avg_life_exp_by_gender %>% filter(Subgroup == "Female") %>% pull(mean_value) >
    avg_life_exp_by_gender %>% filter(Subgroup == "Male") %>% pull(mean_value),
  "Female",
  "Male"
)
subgroup
# Pivot wider to calculate differences
gender_gap <- life_expectancy_2000_2005 %>%
  dplyr::select(Country.or.Area, Subgroup, Value) %>%
  pivot_wider(names_from = Subgroup, values_from = Value) %>%
  mutate(diff = abs(Female - Male)) %>%
  drop_na()

head(gender_gap)

# Top 3 countries with largest differences
disparities <- gender_gap %>%
  arrange(desc(diff)) %>%
  slice_head(n = 3) 

print(disparities)

# Plot
library(ggplot2)

ggplot(disparities, aes(x = reorder(Country.or.Area, diff), y = diff, fill = Country.or.Area)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = round(diff, 1)), hjust = -0.2, size = 5, fontface = "bold", color = "black") +
  coord_flip() +
  labs(
    title = "Top 3 Countries with Largest Gender Gaps (2000–2005)",
    x = "Country",
    y = "Life Expectancy Gap (Female - Male)"
  ) +
  scale_fill_manual(values = c("#E63946", "#2A9D8F", "#457B9D")) +  # bold color palette
  theme_classic() +
  theme(
    plot.title = element_text(size = 12, face = "bold", hjust = 0.5),  # centered + smaller
    axis.title = element_text(size = 13, face = "bold"),
    axis.text = element_text(size = 12),
    panel.grid.major.y = element_blank(),
    panel.grid.major.x = element_line(color = "gray90")
  )


# Bar chart of average life expectancy by gender with labels
avg_life_exp_by_gender %>%
  ggplot(aes(Subgroup, mean_value, fill = Subgroup)) +
  geom_bar(stat = "identity", width = 0.5, show.legend = FALSE) +
  geom_text(aes(label = round(mean_value, 1)), vjust = -0.5, size = 5, fontface = "bold") +
  scale_fill_manual(values = c("Female" = "#F72585", "Male" = "#4361EE")) +
  labs(
    title = "Average Global Life Expectancy by Gender (2000–2005)",
    x = "Gender",
    y = "Average Life Expectancy"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(size = 12, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 13, face = "bold"),
    axis.text = element_text(size = 12),
    panel.grid.major.y = element_blank(),
    panel.grid.major.x = element_line(color = "gray90")
  )