Project: Visualizing the History of Nobel Prize Winners

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the data folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

#Setting Seaborn's Scale to a format suiting a iPython notebook
sns.set_context('notebook')

# Start coding here!

#First, let's take import the data and see a few rows
winners = pd.read_csv('data/nobel.csv')
winners.head()

What is the most commonly awarded gender and birth country?

Store your answers as string variables top_gender and top_country.

1 hidden cell

# Gets the most frequent gender (mode) from the 'sex' column.
top_gender = winners['sex'].value_counts().index[0]
print(top_gender)

# Gets the most frequent birth country (mode) from the 'birth_country' column.
top_country = winners['birth_country'].value_counts().index[0]
print(top_country)

Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?

Store this as an integer called max_decade_usa.

#Creates a column to separate laureates from the US and the rest of the world
winners['US'] = winners['birth_country'] == 'United States of America'

#Creates a column to group decades together
winners['decade'] = (winners['year'] // 10) * 10

#Create a new DataFrame with decade and US laureates ratio, than filter the row that has the maximum value and extract the decade
usa_ratio = winners.groupby('decade', as_index=False)['US'].agg('mean')
max_decade_usa = usa_ratio[usa_ratio['US']==usa_ratio['US'].max()]['decade'].values[0]
print(max_decade_usa)

#Visualize ratio across the decades to validade my findings 
sns.set_context('notebook')
g = sns.relplot(x='decade', y='US', data=winners, kind='line', ci=None, marker='D')
g.set(xlabel='Decade', ylabel='US vs. Rest of the World (Ratio)')
g.fig.suptitle('US Share of Nobel Laureates by Decade', y=1.03)
plt.show()

Which decade and Nobel Prize category combination had the highest proportion of female laureates?

Store this as a dictionary called max_female_dict where the decade is the key and the category is the value. There should only be one key:value pair.

#Creates a column to convert the 'sex' column to a boolean flag for female laureates in order to facilitate posterior calculations
winners['is_female'] = winners['sex'] == 'Female'

#Creates a Dataframe with a new column that measures the ratio of female laureates per decade
female_ratio = winners.groupby(['decade', 'category'], as_index=False)['is_female'].mean()

#Finds the index
highest_female_ratio_index = female_ratio['is_female'].idxmax()


#Creates the Dictionary
max_female_dict = {}

#Assigns Key and Value pair to the dictionary 
max_female_dict[female_ratio.loc[highest_female_ratio_index, 'decade']] = female_ratio.loc[highest_female_ratio_index, 'category'] 

#Visualize result
print(max_female_dict) #testing print 1

Who was the first woman to receive a Nobel Prize, and in what category?

Save your string answers as first_woman_name and first_woman_category.

#Creates a Dataframe with only Female Laureates
female_winners = winners[winners['sex'] == 'Female']

#Determines the index of the first year there has been a Female Laureate 
first_female_laureate_index = female_winners['year'].idxmin()

#Finding the name of the first woman to recieve a Nobel Prize
first_woman_name = female_winners.loc[ first_female_laureate_index, 'full_name']

#Finds the category in which she was awarded
first_woman_category = female_winners.loc[ first_female_laureate_index, 'category']
print(first_woman_name, '\n and\n', first_woman_category)

Which individuals or organizations have won more than one Nobel Prize throughout the years?

Store the full names in a list named repeat_list.

#Counts unique values for the columns 'full_name'
winners_counted = winners['full_name'].value_counts()

#Selects only multiple award winning names and saves the result as a list
repeat_list = list(winners_counted[winners_counted > 1].index)

#Visualize the results
print(repeat_list)