Skip to content

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the data folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

import pandas as pd

# Load the dataset
nobel_data = pd.read_csv('data/nobel.csv')

# Identify the most commonly awarded gender
top_gender = nobel_data['sex'].mode()[0]

# Identify the most commonly awarded birth country
top_country = nobel_data['birth_country'].mode()[0]
print([top_gender, top_country])



# Load the dataset
nobel_data = pd.read_csv('data/nobel.csv')

# Create the 'decade' column
nobel_data['decade'] = (nobel_data['year'] // 10) * 10

# Calculate the total number of winners per decade
total_winners_decade = nobel_data.groupby('decade').size()

# Calculate the number of US-born winners per decade
us_winners_decade = nobel_data[nobel_data['birth_country'] == 'United States of America'].groupby('decade').size()

# Calculate the ratio of US-born winners to total winners per decade
us_ratio_decade = (us_winners_decade / total_winners_decade).fillna(0)

# Identify the decade with the highest ratio
max_decade_usa = us_ratio_decade.idxmax()
print(max_decade_usa)






# Calculate the total number of winners per decade and category
total_winners_decade_category = nobel_data.groupby(['decade', 'category']).size()

# Calculate the number of female winners per decade and category
female_winners_decade_category = nobel_data[nobel_data['sex'] == 'Female'].groupby(['decade', 'category']).size()

# Calculate the proportion of female winners per decade and category
female_ratio_decade_category = (female_winners_decade_category / total_winners_decade_category).fillna(0)

# Identify the decade and category with the highest proportion of female laureates
max_female_decade_category = female_ratio_decade_category.idxmax()

# Store the result in a dictionary
max_female_dict = {max_female_decade_category[0]: max_female_decade_category[1]}
print(max_female_dict)







# Filter the dataset for female laureates
female_laureates = nobel_data[nobel_data['sex'] == 'Female']

# Find the first woman to receive a Nobel Prize
first_female = female_laureates.loc[female_laureates['year'].idxmin()]

# Extract the name and category
first_woman_name = first_female['full_name']
first_woman_category = first_female['category']
print([first_woman_category,first_woman_name])


# Identify individuals or organizations with more than one Nobel Prize
repeat_winners = nobel_data['full_name'].value_counts()
repeat_list = repeat_winners[repeat_winners > 1].index.tolist()

# Print the list of repeat winners
print(repeat_list)


Here's a breakdown of each section of the code and what it does:

1. Identifying the Most Commonly Awarded Gender and Birth Country

import pandas as pd # Load the dataset nobel_data = pd.read_csv('data/nobel.csv') # Identify the most commonly awarded gender top_gender = nobel_data['sex'].mode()[0] # Identify the most commonly awarded birth country top_country = nobel_data['birth_country'].mode()[0] # Print the results print([top_gender, top_country])
  • import pandas as pd: Imports the pandas library, commonly used for data manipulation and analysis.
  • pd.read_csv('data/nobel.csv'): Reads the CSV file containing the Nobel Prize data into a pandas DataFrame called nobel_data.
  • nobel_data['sex'].mode()[0]: Finds the mode (most frequent value) of the 'sex' column, which indicates the most commonly awarded gender. This value is stored in the variable top_gender.
  • nobel_data['birth_country'].mode()[0]: Finds the mode of the 'birth_country' column, indicating the most common birth country of Nobel Prize winners. This value is stored in the variable top_country.
  • print([top_gender, top_country]): Prints the most common gender and birth country.

2. Finding the Decade with the Highest Ratio of US-born Winners

# Load the dataset nobel_data = pd.read_csv('data/nobel.csv') # Create the 'decade' column nobel_data['decade'] = (nobel_data['year'] // 10) * 10 # Calculate the total number of winners per decade total_winners_decade = nobel_data.groupby('decade').size() # Calculate the number of US-born winners per decade us_winners_decade = nobel_data[nobel_data['birth_country'] == 'United States of America'].groupby('decade').size() # Calculate the ratio of US-born winners to total winners per decade us_ratio_decade = (us_winners_decade / total_winners_decade).fillna(0) # Identify the decade with the highest ratio max_decade_usa = us_ratio_decade.idxmax() # Print the result print(max_decade_usa)
  • nobel_data['decade'] = (nobel_data['year'] // 10) * 10: Creates a new column 'decade' by dividing the 'year' by 10, truncating the result, and then multiplying by 10 to get the starting year of the decade.
  • nobel_data.groupby('decade').size(): Groups the data by 'decade' and counts the number of winners in each decade, storing the result in total_winners_decade.
  • nobel_data[nobel_data['birth_country'] == 'United States of America'].groupby('decade').size(): Filters the DataFrame for US-born winners, groups by 'decade', and counts the number of winners, storing the result in us_winners_decade.
  • (us_winners_decade / total_winners_decade).fillna(0): Calculates the ratio of US-born winners to total winners per decade and fills any NaN values with 0.
  • us_ratio_decade.idxmax(): Finds the index (decade) with the highest ratio of US-born winners.
  • print(max_decade_usa): Prints the decade with the highest ratio of US-born winners.

3. Highest Proportion of Female Laureates in a Decade and Category

# Calculate the total number of winners per decade and category total_winners_decade_category = nobel_data.groupby(['decade', 'category']).size() # Calculate the number of female winners per decade and category female_winners_decade_category = nobel_data[nobel_data['sex'] == 'Female'].groupby(['decade', 'category']).size() # Calculate the proportion of female winners per decade and category female_ratio_decade_category = (female_winners_decade_category / total_winners_decade_category).fillna(0) # Identify the decade and category with the highest proportion of female laureates max_female_decade_category = female_ratio_decade_category.idxmax() # Store the result in a dictionary max_female_dict = {max_female_decade_category[0]: max_female_decade_category[1]} # Print the result print(max_female_dict)
  • nobel_data.groupby(['decade', 'category']).size(): Groups the data by 'decade' and 'category', counting the total number of winners, stored in total_winners_decade_category.
  • nobel_data[nobel_data['sex'] == 'Female'].groupby(['decade', 'category']).size(): Filters the DataFrame for female winners, groups by 'decade' and 'category', and counts the number of female winners, stored in female_winners_decade_category.
  • (female_winners_decade_category / total_winners_decade_category).fillna(0): Calculates the proportion of female winners to total winners per decade and category, filling NaN values with 0.
  • female_ratio_decade_category.idxmax(): Finds the index (decade and category) with the highest proportion of female laureates.
  • max_female_dict = {max_female_decade_category[0]: max_female_decade_category[1]}: Stores the result as a dictionary with the decade as the key and the category as the value.
  • print(max_female_dict): Prints the dictionary containing the decade and category with the highest proportion of female laureates.

4. First Woman to Receive a Nobel Prize

# Filter the dataset for female laureates female_laureates = nobel_data[nobel_data['sex'] == 'Female'] # Find the first woman to receive a Nobel Prize first_female = female_laureates.loc[female_laureates['year'].idxmin()] # Extract the name and category first_woman_name = first_female['full_name'] first_woman_category = first_female['category'] # Print the result print([first_woman_category, first_woman_name])
  • nobel_data[nobel_data['sex'] == 'Female']: Filters the DataFrame for female laureates, storing the result in female_laureates.
  • female_laureates.loc[female_laureates['year'].idxmin()]: Finds the row with the earliest year (minimum value) among female laureates, indicating the first woman to receive a Nobel Prize, stored in first_female.
  • first_female['full_name']: Extracts the full name of the first woman laureate.
  • first_female['category']: Extracts the category in which the first woman won the prize.
  • print([first_woman_category, first_woman_name]): Prints the category and name of the first woman to win a Nobel Prize.

5. Repeat Nobel Prize Winners

# Identify individuals or organizations with more than one Nobel Prize repeat_winners = nobel_data['full_name'].value_counts() repeat_list = repeat_winners[repeat_winners > 1].index.tolist() # Print the list of repeat winners print(repeat_list)
  • nobel_data['full_name'].value_counts(): Counts the occurrences of each name in the 'full_name' column, storing the result in repeat_winners.
  • repeat_winners[repeat_winners > 1].index.tolist(): Filters the names that have won more than once and converts them to a list, stored in repeat_list.
  • print(repeat_list): Prints the list of individuals or organizations that have won more than one Nobel Prize.