The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.
The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv
file in the data
folder.
In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!
# Import the required libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Load the data
data = pd.read_csv('data/nobel.csv')
# Check first 5 rows
data.head()
# Check info
data.info()
What is the most commonly awarded gender and birth country?
# Change the scale
sns.set_context('notebook')
# Plot counts of awardees per gender
gender_plot = sns.countplot(data=data, x='sex', palette='colorblind')
# Add a title
gender_plot.set_title('Most Nobel Prize Winners are Male', y=1.05, fontweight='bold')
# Add axis labels
gender_plot.set(xlabel = 'Gender', ylabel= 'Number of Nobel Prize winners')
# Show the plot
plt.show()
# Determine the most commonly awarded gender and show the result
top_gender = data['sex'].mode()[0]
print(f'Most commonly awarded gender: {top_gender}')
# Get the top 5 birth countries
top5_countries = data['birth_country'].value_counts(ascending=False).head(5).index
# Filter the data to include only the top 5 birth countries
top5_countries_df = data[data['birth_country'].isin(top5_countries)]
# Plot a countplot of the top 5 birth countries
countries_plot = sns.countplot(data=top5_countries_df, x='birth_country', order=top5_countries, width=0.7, color='darkgrey')
# Change the color of the first bar
bars = countries_plot.patches
bars[0].set_facecolor('royalblue')
# Rotate the x labels for better readability
plt.xticks(rotation=45, ha='right')
# Add a title
countries_plot.set_title('The USA Produces the Most Nobel Prize Winners', y=1.05, fontweight='bold')
# Add axis labels
countries_plot.set(xlabel='Birth country', ylabel='Number of Nobel Prize winners')
# Show the plot
plt.show()
# Determine the most commonly awarded birth country
top_country = data['birth_country'].mode()[0]
print(f'Most commonly awarded birth country: {top_country}')
Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?
# Create a new column that checks if the laureate was born in the US
data['us_born'] = data['birth_country'] == 'United States of America'
# Create a decade column based on the year column
data['decade'] = (data['year'] // 10) * 10
# Group data by decade, calculate the proportion of US-born winners per decade, and sort in descending order
grouped_data_us = data.groupby('decade', as_index=False)['us_born'].mean().sort_values(by='us_born', ascending=False)
# Show the result
print(grouped_data_us)
# Identify the decade with the highest ratio of US-born winners and show the result
max_decade_usa = grouped_data_us['decade'].iloc[0].astype(int)
print(f'Decade with the highest ratio of US-born winners: {max_decade_usa}')