Skip to content
Crunching Death Rates: From Age-Specific to Standardized
Crunching Death Rates: From Age-Specific to Standardized
The task involves comparing the chronic obstructive pulmonary disease (COPD) crude death rate and the age-standardized death rate for all ages in both the United States and Uganda for 2019. There are various steps involved in this comparative analysis:
- Data Collection: Obtain age-specific COPD death rates for the populations of interest (given), age-specific population numbers by country and the WHO World Standard Population distribution (for standardization).
- Data Wrangling and Manipulation: Transform data into similar forms for comparison (same age groups and derive proportions) and combine into one table.
- Data Visualization: Visualize all population distributions to get a sense of population structure and differences.
- Crude Death Rate Calculation: Merge death rate and population tables then multiply age-specific death rate by population proportion, summing up all contributions to get overall crude death rate per country.
- Age-Standardization Process: Using the WHO standard proportions, calculate age-standardized death rates for Uganda and USA.
- Assumptions: The age-specific death rates are assumed to be accurate and representative of the population. The standard population used for age-standardization is assumed to be appropriate for comparison.
- Differences in Rates: Crude death rates can be influenced by the age distribution of the population; populations with more elderly individuals may have higher crude death rates, as is evident by the much larger crude death rate in the US compared to Uganda. This is due to Uganda's population distribution being skewed towards very young ages, whereas the US is more balanced across all ages. Age-standardized rates account for differences in age distribution, providing a more comparable measure of mortality risk across different populations. This is evident in this case, where we see almost equal age-standardized death rates in both Uganda and the US (28.7 and 28.4, respectively).
#Table of age-specific death rates of COPD copied from exercise
import pandas as pd
data = {
"Age Group (years)": ["0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85+"],
"Death Rate, United States, 2019": [0.04, 0.02, 0.02, 0.02, 0.06, 0.11, 0.29, 0.56, 1.42, 4.00, 14.13, 37.22, 66.48, 108.66, 213.10, 333.06, 491.10, 894.45],
"Death Rate, Uganda, 2019": [0.40, 0.17, 0.07, 0.23, 0.38, 0.40, 0.75, 1.11, 2.04, 5.51, 13.26, 33.25, 69.62, 120.78, 229.88, 341.06, 529.31, 710.40]
}
#Create dataframe of age-specific death rates
death_rates = pd.DataFrame(data)
death_ratesimport pandas as pd
# Original data copied from source #2 WHO Standard Population [Ahmad, O. B., Boschi-Pinto, C., Lopez, A. D., Murray, C. J., Lozano, R., & Inoue, M. (2001). Age standardization of rates: a new WHO standard. Geneva: World Health Organization, 9(10), 1-14.]
data = {
"Age group": ["0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85+"],
"Segi (\"world\") standard": [12.00, 10.00, 9.00, 9.00, 8.00, 8.00, 6.00, 6.00, 6.00, 6.00, 5.00, 4.00, 4.00, 3.00, 2.00, 1.00, 0.50, 0.50],
"Scandinavian (\"European\") standard": [8.00, 7.00, 7.00, 7.00, 7.00, 7.00, 7.00, 7.00, 7.00, 7.00, 7.00, 6.00, 5.00, 4.00, 3.00, 2.00, 1.00, 1.00],
"WHO World Standard": [8.86, 8.69, 8.60, 8.47, 8.22, 7.93, 7.61, 7.15, 6.59, 6.04, 5.37, 4.55, 3.72, 2.96, 2.21, 1.52, 0.91, 0.63]
}
# Extracting only the first and fourth columns
extracted_data = {
"Age Group (years)": data["Age group"],
"WHO World Standard (%)": data["WHO World Standard"]
}
# Creating a DataFrame from the extracted data
who_df = pd.DataFrame(extracted_data)
who_df# Due to the vast amount of population data in the UN World Population Prospects (2022), I used the data portal to extract population estimates for only Uganda and USA in 2019 and export this data as a small xlsx file [UN World Population Prospects (2022) — Population Estimates 1950-2021].
population_data = pd.read_excel('datasets/population_data.xlsx', sheet_name='Data', skiprows=4, usecols=[1,2,3])
population_data.columns = ['Age Group (years)','Population, Uganda, 2019','Population, United States, 2019']
population_data# Group age groups in population_data above 85 to a single row called 85+
population_data_above_85 = population_data[population_data['Age Group (years)'].isin(['85-89', '90-94', '95-99', '100+'])].copy()
# Sum the populations for Uganda and USA for the age groups above 85
population_data_above_85_sum = population_data_above_85.groupby('Age Group (years)').sum().reset_index()
# Create a new row for the summed populations
new_row = {'Age Group (years)': '85+',
'Population, Uganda, 2019': population_data_above_85_sum['Population, Uganda, 2019'].sum(),
'Population, United States, 2019': population_data_above_85_sum['Population, United States, 2019'].sum()}
# Append the new row to the original dataframe and drop the now redundant rows
population_data = population_data[~population_data['Age Group (years)'].isin(['85-89','90-94', '95-99', '100+'])] # Remove rows for 85-89, 90-94, 95-99, 100+
population_data = population_data.append(new_row, ignore_index=True)
population_data# Calculate the total population for each country to calculate proportions of each age group w.r.t. total population
total_population_uganda = population_data['Population, Uganda, 2019'].sum()
total_population_usa = population_data['Population, United States, 2019'].sum()
# Calculate the proportion of each age group to the total population of each country
population_data['Proportion (%), Uganda, 2019'] = round(population_data['Population, Uganda, 2019'] / total_population_uganda * 100, 2)
population_data['Proportion (%), United States, 2019'] = round(population_data['Population, United States, 2019'] / total_population_usa * 100, 2)
# Reorder columns by country blocks
population_data = population_data[['Age Group (years)', 'Population, Uganda, 2019', 'Proportion (%), Uganda, 2019', 'Population, United States, 2019', 'Proportion (%), United States, 2019']]
# Merge WHO World Standardized age groups to Uganda and US data
population_data_who = pd.merge(who_df,population_data,on="Age Group (years)")
population_data_who# Visualise population distributions of Uganda, United States and the WHO World Standard to see differences
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# Create subplots for each country and world standard: 1 row, 3 cols
fig = make_subplots(rows=1, cols=3, subplot_titles=('Uganda (2019)', 'United States (2019)', 'WHO World Standard'), horizontal_spacing=0.05)
# Uganda plot
fig.add_trace(
go.Bar(y=population_data_who['Age Group (years)'], x=population_data_who['Proportion (%), Uganda, 2019'], name='Uganda', marker_color='darkblue', orientation='h',
hovertemplate='%{y}: %{x}%'),
row=1, col=1
)
# USA plot
fig.add_trace(
go.Bar(y=population_data_who['Age Group (years)'], x=population_data_who['Proportion (%), United States, 2019'], name='United States', marker_color='firebrick', orientation='h',
hovertemplate='%{y}: %{x}%'),
row=1, col=2
)
# Standardized WHO plot
fig.add_trace(
go.Bar(y=population_data_who['Age Group (years)'], x=population_data_who['WHO World Standard (%)'], name='WHO', marker_color='teal', orientation='h',
hovertemplate='%{y}: %{x}%'),
row=1, col=3
)
# Update layout for a cleaner look
fig.update_layout(height=550, width=1030, title_text="Population proportion (% of total) by age group for Uganda, USA and WHO World Standard")
fig.show()
# Uganda population skewed towards very young, USA balanced across most ages, world standard like half a normal distribution with peak at 0# Merging death rates with population data on "Age Group (years)"
merged_data = pd.merge(death_rates, population_data_who, on="Age Group (years)")
# Calculate expected number of deaths of total population proportion (weights) for age-specific death rates given population proportions in each country and age group
merged_data['Expected Deaths, Uganda, 2019'] = merged_data['Death Rate, Uganda, 2019'] * merged_data['Proportion (%), Uganda, 2019'] / 100
merged_data['Expected Deaths, United States, 2019'] = merged_data['Death Rate, United States, 2019'] * merged_data['Proportion (%), United States, 2019'] / 100
# Calculate expected number of deaths of total population proportion (weights) for WHO standardized age groups for age-standardized death rate
merged_data['Expected Deaths, Uganda (Standardized), 2019'] = merged_data['Death Rate, Uganda, 2019'] * merged_data['WHO World Standard (%)'] / 100
merged_data['Expected Deaths, United States (Standardized), 2019'] = merged_data['Death Rate, United States, 2019'] * merged_data['WHO World Standard (%)'] / 100
# Calculate total crude death rates for Uganda and United States by summing the Expected Deaths columns, respectively; much higher crude rate in US than Uganda -> aging population
uganda_crude = round(merged_data['Expected Deaths, Uganda, 2019'].sum(),1)
usa_crude = round(merged_data['Expected Deaths, United States, 2019'].sum(),1)
# Calculate age-standardized death rates for Uganda and United States using WHO World Standard population proportions; similar rates obtained for both countries, even slightly higher in Uganda! The power of standardization!
uganda_stdz = round(merged_data['Expected Deaths, Uganda (Standardized), 2019'].sum(),1)
usa_stdz = round(merged_data['Expected Deaths, United States (Standardized), 2019'].sum(),1)
print(f"Crude Death Rate, Uganda, 2019: {uganda_crude} | Age-standardized Death Rate, Uganda, 2019: {uganda_stdz}")
print(f"Crude Death Rate, United States, 2019: {usa_crude} | Age-standardized Death Rate, United States, 2019: {usa_stdz}")
merged_data