Skip to content

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the data folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Importing the dataset
nobel = pd.read_csv("data/nobel.csv")
display(nobel)

Let's check how many prizes have been awarded per year. According to the dataset, the number of prizes awarded has increased over time, starting from 6 prizes in 1901 to over 10 prizes awarded in 2023

print(nobel["birth_country"].value_counts())
# summarizing count of prizes per year
year_prizes = nobel.groupby("year")[["prize"]].count()
# plotting the evolution of the number of prizes awarded over time
g = sns.relplot(x=year_prizes.index, y="prize", data=year_prizes, kind='line')
g.set(ylabel="number of prizes awarded")
plt.show()

Let's see if a scientist has been awarded more than once and who was that genius.

# Getting the count of prizes per awardee
scientist_tab = nobel["full_name"].value_counts()

# Getting the awardees with more than one prize
scientist_tab_top = (scientist_tab[scientist_tab >= 2]).index

# saving the full name list
repeat_list = list(scientist_tab_top)

print(scientist_tab_top)

We can see that the organization Comité international de la Croix Rouge (International Committee of the Red Cross) has received most Nobel awards during the 1900s in the category of Peace. As for individuals, Marie Curie, née Sklodowska, John Bardeen, Linus Carl Pauling, and Frederick Sanger have all won twice.

Let's explore the distribution of the categories awarded.

# Grouping by category
category_tab = nobel.groupby("category")[["category"]].count().rename(columns={"category": "num_prizes"}).sort_values("num_prizes", ascending = False)
# Bar plot
g = sns.catplot(y = "category", data = nobel, kind = "count", order = category_tab.index)
g.set(xlabel = "Number of prizes")
g.fig.suptitle("Number of prizes per category awarded", y = 1.03)
plt.show()

Medicine and Physics have been the most awarded categories since 1900, with more 200 prizes granted.

Now, let's explore which gender and birth country received most frequently the prize.

# Getting the top gender
top_gender = nobel["sex"].value_counts().index[0]
print("The most commonly awarded gender was " + str(top_gender) + ", with " + str(gender_tab.iloc[0,1]) + " prizes during " + str(np.min(nobel["year"])) + " and " + str(np.max(nobel["year"])) + ".")