Skip to content

Suicide Rate Analysis

Catherine Greene
December 15, 2022

Context Close to 800 000 people die due to suicide every year, which is one person every 40 seconds. Suicide is a global phenomenon and occurs throughout the lifespan. Effective and evidence-based interventions can be implemented at population, sub-population, and individual levels to prevent suicide and suicide attempts. There are indications that for each adult who died by suicide there may have been more than 20 others attempting suicide.

Data Dictionary

We will be using the dataset about suicide rates from 1985 to 2016. This dataset has the following attributes:

country: Country

year: Year

sex: Sex (male or female)

age: Suicide age range, ages divided into six categories

suicides_no: number of suicides

population: population of that sex, in that age range, in that country, and in that year

suicides/100k pop: Number of suicides per 100k population

gdp_for_year($): GDP of the country in that year in dollars

gdp_per_capita($): Ratio of the country’s GDP and its population

generation: Generation of the suicides in question, being possible 6 different categories

Questions to explore

Is the suicide rate more prominent in some age categories than others?

Which countries have the most and the least number of suicides?

What is the effect of the population on suicide rates?

What is the effect of the GDP of a country on suicide rates?

What is the trend of suicide rates across all the years?

Is there a difference between the suicide rates of men and women?

# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Read in the csv file
suicides = pd.read_csv('master.csv')

# Quick look
suicides.head()
# Structure

suicides.info()
# Basic Statistics

suicides.describe()
# Suicide Rate and Age Categories (suicides/100k pop)
age_cat = suicides.groupby('age')['suicides_no'].sum().to_frame()

age_cat.style.highlight_max(color = 'lightsalmon')
suicides['suicides_no'].mean()
# Plot Suicide Rate and Age Categories
sns.barplot(data = suicides, x = 'age', y = 'suicides_no', 
            palette = 'Blues').axhline((suicides['suicides_no'].mean()), color = 'tomato', 
           linestyle = '--', alpha = .6, linewidth = 2, 
           label = 'Average')

# title
plt.title('Suicide Rate By Age Category', size = 18)

# x and y labels 
plt.xlabel('Age Group', size = 16)
plt.ylabel('Suicide Rate', size = 16)

# tick size
plt.xticks(size = 14, rotation = 45)
plt.yticks(size = 14)

# show legend
plt.legend(prop = {'size':13})

Observation: Suicide rates are more prominent in the 35-54 years age group. The 15-24, 75 +, and 5-14 years all have below average suicide rates.

# Country Suicide Rates
country_rates = suicides.groupby('country')['suicides_no'].sum().sort_values(ascending = False).to_frame()

country_rates.style.highlight_max(color = 'lightsalmon').highlight_min(color = 'steelblue')

Observation: Russian Federation and United States have the highest suicide rates with totals of 1209742 and 1034013 respectively. The countries with the lowest suicide rates are Saint Kitts and Nevis and Dominica with totals of 0 each.

# Effect of population on suicide rates & GDP on suicide rates
matrix = suicides.corr(method = 'pearson')

# Heatmap
sns.heatmap(matrix, annot = True)

# Plot Title
plt.title('Correlation Heatmap', size = 18)

# Ticks
plt.xticks(size = 14)
plt.yticks(size = 14)
# Pairplot of relationships
sns.pairplot(data = suicides)

plt.show()

Observation: Looks like higher suicide rates are a bit more prevalent in countries with higher GDP. However, it doesn't look like there is any significant correlation between the two.

# Suicide rate trends across the years
suicides_mean = suicides['suicides_no'].mean()

# Function to highlight cells
def highlight_cells(val):
    color = 'lightsalmon' if val > suicides_mean else 'lightsteelblue'
    return 'background-color: {}'.format(color)

suicides.groupby('year')['suicides_no'].mean().to_frame().style.applymap(highlight_cells)