Suicide Rate Analysis
Catherine Greene
December 15, 2022
Context Close to 800 000 people die due to suicide every year, which is one person every 40 seconds. Suicide is a global phenomenon and occurs throughout the lifespan. Effective and evidence-based interventions can be implemented at population, sub-population, and individual levels to prevent suicide and suicide attempts. There are indications that for each adult who died by suicide there may have been more than 20 others attempting suicide.
Data Dictionary
We will be using the dataset about suicide rates from 1985 to 2016. This dataset has the following attributes:
country: Country
year: Year
sex: Sex (male or female)
age: Suicide age range, ages divided into six categories
suicides_no: number of suicides
population: population of that sex, in that age range, in that country, and in that year
suicides/100k pop: Number of suicides per 100k population
gdp_for_year($): GDP of the country in that year in dollars
gdp_per_capita($): Ratio of the country’s GDP and its population
generation: Generation of the suicides in question, being possible 6 different categories
Questions to explore
Is the suicide rate more prominent in some age categories than others?
Which countries have the most and the least number of suicides?
What is the effect of the population on suicide rates?
What is the effect of the GDP of a country on suicide rates?
What is the trend of suicide rates across all the years?
Is there a difference between the suicide rates of men and women?
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns# Read in the csv file
suicides = pd.read_csv('master.csv')
# Quick look
suicides.head()# Structure
suicides.info()# Basic Statistics
suicides.describe()# Suicide Rate and Age Categories (suicides/100k pop)
age_cat = suicides.groupby('age')['suicides_no'].sum().to_frame()
age_cat.style.highlight_max(color = 'lightsalmon')suicides['suicides_no'].mean()# Plot Suicide Rate and Age Categories
sns.barplot(data = suicides, x = 'age', y = 'suicides_no',
palette = 'Blues').axhline((suicides['suicides_no'].mean()), color = 'tomato',
linestyle = '--', alpha = .6, linewidth = 2,
label = 'Average')
# title
plt.title('Suicide Rate By Age Category', size = 18)
# x and y labels
plt.xlabel('Age Group', size = 16)
plt.ylabel('Suicide Rate', size = 16)
# tick size
plt.xticks(size = 14, rotation = 45)
plt.yticks(size = 14)
# show legend
plt.legend(prop = {'size':13})
Observation: Suicide rates are more prominent in the 35-54 years age group. The 15-24, 75 +, and 5-14 years all have below average suicide rates.
# Country Suicide Rates
country_rates = suicides.groupby('country')['suicides_no'].sum().sort_values(ascending = False).to_frame()
country_rates.style.highlight_max(color = 'lightsalmon').highlight_min(color = 'steelblue')Observation: Russian Federation and United States have the highest suicide rates with totals of 1209742 and 1034013 respectively. The countries with the lowest suicide rates are Saint Kitts and Nevis and Dominica with totals of 0 each.
# Effect of population on suicide rates & GDP on suicide rates
matrix = suicides.corr(method = 'pearson')
# Heatmap
sns.heatmap(matrix, annot = True)
# Plot Title
plt.title('Correlation Heatmap', size = 18)
# Ticks
plt.xticks(size = 14)
plt.yticks(size = 14)# Pairplot of relationships
sns.pairplot(data = suicides)
plt.show()
Observation: Looks like higher suicide rates are a bit more prevalent in countries with higher GDP. However, it doesn't look like there is any significant correlation between the two.
# Suicide rate trends across the years
suicides_mean = suicides['suicides_no'].mean()
# Function to highlight cells
def highlight_cells(val):
color = 'lightsalmon' if val > suicides_mean else 'lightsteelblue'
return 'background-color: {}'.format(color)
suicides.groupby('year')['suicides_no'].mean().to_frame().style.applymap(highlight_cells)