The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.
The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv
file in the data
folder.
In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
# Start coding here!
# 1. load dataset nobel.csv and view first few rows
nobel_df = pd.read_csv("data/nobel.csv")
nobel_df.head()
# Exploring dataset info
nobel_df.info()
# Determining most common gender and birth country
top_country = nobel_df[['birth_country','sex']].value_counts()
top_country
# 1. what is the most commonly awarded gender and birth country
top_gender = nobel_df['sex'].value_counts().index[0]
top_country = nobel_df['birth_country'].value_counts().idxmax()
print(f'Most commonly awarded gender is "{top_gender}" from "{top_country}".')
# 2. create a new US born winners column based on information in an existing column:
# create US-born winners column
nobel_df['US_winner'] = nobel_df['birth_country'].apply (lambda x: 1 if x== 'United States of America' else 0)
nobel_df['US_winner'].value_counts()
# 2. create decade col should end with 0. divide year values by 10 and wrap this in np.floor() You'll then need to multiply by 10 to get the four-digit year value and set the type as int using .astype().
import numpy as np
nobel_df['decade'] = (np.floor(nobel_df['year'] / 10) * 10).astype(int)
nobel_df.head()
# 2. finding ratio
ratio_decade_usa = nobel_df.groupby('decade', as_index=False)['US_winner'].mean()
# Identify the decade with the highest ratio of US-born winners
max_decade_usa = ratio_decade_usa[ratio_decade_usa['US_winner'] == ratio_decade_usa['US_winner'].max()]['decade'].values[0]
max_decade_usa
# 2. create a relational line plot
import seaborn as sns
import matplotlib.pyplot as plt
plot = sns.relplot(x='decade', y= 'US_winner',data= ratio_decade_usa, kind='line')
plot.fig.suptitle("US-Born Decade Winners")
plt.show()
# 3. Find the decade and category with the highest proportion of female laureates.
# 3.a filtering for female winners 1 is true 0 is false
nobel_df['female_winners']=nobel_df['sex'].apply(lambda x: 1 if x =='Female' else 0)
nobel_df['female_winners'].value_counts()
# 3.b Groupby two columns decade and category
female_decade_winner = nobel_df.groupby(['decade','category'], as_index=False)['female_winners'].mean()
# identify highest decade and category of female winners
highest_female_decade_winner= female_decade_winner[female_decade_winner['female_winners'] == female_decade_winner['female_winners'].max()][['decade','category']]
highest_female_decade_winner
# 3.c create a dictionary
max_female_dict = {highest_female_decade_winner['decade'].values[0]: highest_female_decade_winner['category'].values[0]}
print('Max Female Dictionary:',max_female_dict)
print(f'In Decade "{list(max_female_dict.keys())[0]}" Nobel Prize category "{list(max_female_dict.values())[0]}" combination had the highest proportion of female laureates.')
# 3.d Create a relational line plot with multiple categories.
sns.relplot(x='decade', y='female_winners',data=female_decade_winner, kind='line', hue= 'category')
plt.title('Proportion of Female Nobel Laureates by Decade and Category')
plt.xlabel('Decade')
plt.ylabel('Proportion of Female Winners')
plt.show()
# 4. Who was the first woman to receive a Nobel Prize, and in what category?
nobel_women = nobel_df[nobel_df['sex'] == 'Female']
# find the minimum value in a column year
min_row = nobel_women[nobel_women['year'] == nobel_women['year'].min()]
first_woman_name = min_row['full_name'].values[0]
first_woman_category = min_row['category'].values[0]
print(f'\n The first woman to win a Nobel Prize was {first_woman_name} in the category {first_woman_category}.')