Skip to content

You are a data analyst for a United Nations initiative focused on understanding global health trends. Your latest assignment is to explore and visualize life expectancy data from around the world, focusing on gender differences.

Life expectancy can vary significantly over time and across different countries due to numerous factors, including advancements in medicine, a country's development level, and the impacts of conflicts. Interestingly, data consistently shows that women tend to live longer than men, raising intriguing questions. Could this be due to biological factors or perhaps because women generally care for their health better?

Your task is to explore these patterns and disparities.

The Data

The dataset contains information about life expectancy in various countries or areas, broken down by gender and time periods. The data is sourced from the United Nations Population Division, Gender Statistics, Life Expectancy at Birth.

UNdata.csv
ColumnMeaning
Country.or.AreaThe name of the country or region being described.
SubgroupThe specific subgroup within the country or area (e.g., Female, Male).
YearThe time period for the data provided (e.g., 2000-2005).
SourceThe source of the data, specifying the UN publication or report where the data originated.
UnitThe unit of measurement for life expectancy.
ValueThe measured value for the life expectancy in the specified country, subgroup, and time period.
Value.FootnotesAdditional notes or comments related to the value, if any.
library(dplyr)
library(tidyr)
library(ggplot2)

life_expectancy=read.csv("datasets/UNdata.csv")
head(life_expectancy)
library(dplyr)
library(tidyr)
library(ggplot2)

# Load the dataset
life_expectancy <- read.csv("datasets/UNdata.csv")

# 1. Check for missing values in the Value column
missing <- any(is.na(life_expectancy$Value))

# 2. Compare life expectancy between males and females (2000–2005)
life_2000 <- life_expectancy %>%
  filter(Year == "2000-2005", Subgroup %in% c("Female", "Male"))

avg_life_by_gender <- life_2000 %>%
  group_by(Subgroup) %>%
  summarise(mean_life = mean(Value, na.rm = TRUE))

subgroup <- ifelse(
  avg_life_by_gender$mean_life[avg_life_by_gender$Subgroup == "Female"] >
  avg_life_by_gender$mean_life[avg_life_by_gender$Subgroup == "Male"],
  "Female", "Male"
)

# 3. Identify top 3 countries with the largest gender disparities (2000–2005)
gender_wide <- life_2000 %>%
  select(Country.or.Area, Subgroup, Value) %>%
  pivot_wider(names_from = Subgroup, values_from = Value) %>%
  drop_na(Female, Male) %>%
  mutate(disparity = abs(Female - Male))

disparities <- gender_wide %>%
  arrange(desc(disparity)) %>%
  slice(1:3) %>%
  pull(Country.or.Area)