Skip to content

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the data folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

#Check columns
df = pd.read_csv('data/nobel.csv')
print(df.columns)


#The most commonly awarded gender is male, you can also look at the charts below to see the difference.
findtop_gender = df.groupby('sex')['prize'].agg('count').reset_index()
print(findtop_gender)

sns.catplot( data = findtop_gender,x ="sex",y="prize",hue = "sex", kind = "bar")
#The most commonly awarded country is the United States
findtop_country = df.groupby('birth_country')['prize'].agg('count').reset_index()
findtop_country = findtop_country[findtop_country['prize'] == findtop_country['prize'].max()]
findtop_country
top_country = "United States of America"

# The code below finds the highest ratio of Us-born Nobel prize winners
# base on my findings the highest ratio is from 2000-2010 with a ratio of 18%
decade = df[df['birth_country'] == 'United States of America']
decade['decade']= (decade['year'] // 10 *10).astype(str) + 's'
decade_grouped = decade.groupby('decade')['prize'].agg('count').reset_index()
sum = decade_grouped['prize'].sum()
decade_grouped['ratio'] = (decade_grouped['prize'] / sum ) * 100
max = decade_grouped[decade_grouped['ratio'] == decade_grouped['ratio'].max()]
print(max)
max_decade_usa = 2000
'''
decade and Nobel Prize category combination that has highest proportion of 
female laureates are from the 2020s in the category of physics
'''
female_lau = df[df['sex']=='Female']
female_lau['decade'] = (female_lau['year'] // 10 *10)
x= female_lau.groupby('decade')['sex'].agg('count').reset_index()
y = female_lau.groupby(['decade','category'])['sex'].agg('count').reset_index()
y['proportion'] = y['sex'].value_counts(normalize = True)
print(y.max())



Run cancelled

#irst woman to receive a Nobel Prize was "Marie Curie, née Sklodowska in Physics Category

female_lau2 = df[df['sex']=='Female']
k = female_lau2[female_lau2['year'] == female_lau2['year'].min()][['year','category','full_name']]
k

first_woman_name = "Marie Curie, née Sklodowska"
first_woman_category = "Physics"
Run cancelled


# Selecting the laureates that have received 2 or more prizes
counts = df['full_name'].value_counts()
repeats = counts[counts >= 2].index
repeat_list = list(repeats)

print(repeat_list)
    
Run cancelled