Skip to content

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the data folder.

In this project, we'll get a chance to explore and answer the following questions related to this prizewinning data:

  1. What is the most common gender among the nobel prize winners?
  2. What is the most common country of birth among the nobel prize winners?
  3. What is the decade with the highest ratio of winners born in the USA?
  4. Which decade and Nobel Prize category combination had the highest proportion of female laureates?
  5. Who was the first woman to receive a Nobel Prize, and in what category?
  6. Which individuals or organizations have won more than one Nobel Prize throughout the years?
 # Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
# read dataset from csv file

data = pd.read_csv("data/nobel.csv")
print(data)

Question 1: What is the most common gender among the nobel prize winners?

# identify most common gender value

top_gender = data["sex"].value_counts().index[0]
print(top_gender)

Question 2: What is the most common country of birth among the nobel prize winners?

# identify most common birth country value

top_country = data["birth_country"].value_counts().index[0]
print(top_country)

Question 3: What is the decade with the highest ratio of winners born in the USA?

# create decade column

data["decade"] = round(data["year"], -1)

#create dataframe with ratios by country per decade

ratios_per_decade = data.groupby("decade")["birth_country"].value_counts(normalize=True).reset_index(name="ratio")

#create subset for ratios of USA

ratios_per_decade_usa = ratios_per_decade[ratios_per_decade["birth_country"]=="United States of America"]

#reorder resulting ratios dataframe and select decade value with the highest ratio

max_decade_usa = ratios_per_decade_usa.sort_values("ratio", ascending=False)[:1]["decade"].values[0]
print(max_decade_usa)

Question 4: Which decade and Nobel Prize category combination had the highest proportion of female laureates?

#create dataframe with gender ratios for each decade

ratios_per_decade_gender = data.groupby(["decade","category"])[["sex", "category"]].value_counts(normalize=True).reset_index(name="ratio")

#create subset for female ratios only

ratios_per_decade_female = ratios_per_decade_gender[ratios_per_decade_gender["sex"]=="Female"]

#identify decade and category with the highest female ratio

max_decade_female = ratios_per_decade_female.sort_values("ratio", ascending=False)[:1]["decade"].values[0]
max_category_female = ratios_per_decade_female.sort_values("ratio", ascending=False)[:1]["category"].values[0]

#create dictionary with identified max ratio of females by decade and category combination

max_female_dict = {max_decade_female:max_category_female}
print(max_female_dict)

Question 5: Who was the first woman to receive a Nobel Prize, and in what category?

#dataset is already ordered chronologically, first values for full name and category for female are identified 

first_woman_name = data[data["sex"]=="Female"]["full_name"][:1].values[0]
first_woman_category = data[data["sex"]=="Female"]["category"][:1].values[0]
first_women_dict={first_woman_name:first_woman_category}
print(first_women_dict)

Question 6: Which individuals or organizations have won more than one Nobel Prize throughout the years?

#count number of full name repetitions in the dataset
number_name = data["full_name"].value_counts()

#create list for repeated values
repeat_list = list(number_name[number_name>1].index[0:])

print(repeat_list)