The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.
The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv
file in the data
folder.
In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
# Start coding here!
# Loading in required libraries
import pandas as pd
import numpy as np
# Load the dataset
nobel_df = pd.read_csv('data/nobel.csv')
# Exploring the data to understand its structure
nobel_df.head()
# Question 1: Most commonly awarded gender and birth country
# Find the mode of the 'Gender' and 'Birth Country' columns
top_gender = nobel_df['sex'].mode()[0]
top_country = nobel_df['birth_country'].mode()[0]
# Question 2: Decade with the highest ratio of US-born Nobel Prize winners to total winners
# Extract the decade from the year
nobel_df['Decade'] = (nobel_df['year'] // 10) * 10
# Filter for US-born winners
us_winners = nobel_df[nobel_df['birth_country'] == 'United States of America']
# Group by decade and calculate the ratio of US-born winners to total winners in each decade
us_ratio_by_decade = us_winners.groupby('Decade').size() / nobel_df.groupby('Decade').size()
# Find the decade with the highest ratio
max_decade_usa = us_ratio_by_decade.idxmax()
# Question 3: Decade and Nobel Prize category combination with the highest proportion of female laureates
# Group by both decade and category, filtering for female laureates
female_winners = nobel_df[nobel_df['sex'] == 'Female']
female_ratio_by_decade_category = (female_winners.groupby(['Decade', 'category']).size() /
nobel_df.groupby(['Decade', 'category']).size())
# Find the (decade, category) combination with the highest proportion of female laureates
highest_female_combo = female_ratio_by_decade_category.idxmax()
max_female_dict = {highest_female_combo[0]: highest_female_combo[1]}
# Question 4: First woman to receive a Nobel Prize and her category
# Filter female laureates, and sort by the year to find the first female winner
first_female_laureate = female_winners.sort_values(by='year').iloc[0]
first_woman_name = first_female_laureate['full_name']
first_woman_category = first_female_laureate['category']
# Question 5: Individuals or organizations with multiple Nobel Prizes
# Count occurrences of each laureate and filter for those with more than one win
repeat_winners = nobel_df['full_name'].value_counts()
repeat_list = repeat_winners[repeat_winners > 1].index.tolist()
# Output the results
top_gender, top_country, max_decade_usa, max_female_dict, first_woman_name, first_woman_category, repeat_list