Skip to content

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the data folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

Project Instructions: Visualizing the History of Nobel Prize Winners

Analyze Nobel Prize winner data (1901–2023) to identify patterns by completing the following tasks:

Task 1: Most Commonly Awarded Gender and Birth Country

  • Objective: Determine the most frequently awarded gender and birth country among Nobel Prize winners.
  • Deliverable: Store the results as string variables:
    • top_gender: The most common gender (e.g., "Male").
    • top_country: The most common birth country (e.g., "United States of America").

Task 2: Decade with the Highest Ratio of US-born Winners

  • Objective: Identify the decade with the highest proportion of US-born Nobel Prize winners relative to total winners across all categories.
  • Deliverable: Store the decade as an integer:
    • max_decade_usa: The decade (e.g., 2000).

Task 3: Decade and Category with the Highest Proportion of Female Laureates

  • Objective: Find the decade and Nobel Prize category combination with the highest proportion of female laureates.
  • Deliverable: Store the result as a dictionary with a single key-value pair:
    • max_female_dict: Dictionary where the key is the decade (integer, e.g., 1990) and the value is the category (string, e.g., "Peace").

Task 4: First Female Nobel Prize Winner

  • Objective: Identify the first woman to receive a Nobel Prize and the category of her award.
  • Deliverable: Store the results as string variables:
    • first_woman_name: The full name of the first female winner (e.g., "Marie Curie").
    • first_woman_category: The category of the prize (e.g., "Physics").

Task 5: Repeat Nobel Prize Winners

  • Objective: Determine which individuals or organizations have won more than one Nobel Prize.
  • Deliverable: Store the full names in a list:
    • repeat_list: A list of names (e.g., ["Marie Curie", "International Committee of the Red Cross"]).
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("data/nobel.csv")
df.head()
df.info()

Task 1: Most Commonly Awarded Gender and Birth Country

  • Objective: Determine the most frequently awarded gender and birth country among Nobel Prize winners.
  • Deliverable: Store the results as string variables:
    • top_gender: The most common gender (e.g., "Male").
    • top_country: The most common birth country (e.g., "United States of America").
# Find the most common gender and birth country
top_gender = df['sex'].value_counts().index[0]
top_country = df['birth_country'].value_counts().index[0]

# Print the results
print("Most common gender:", top_gender)
print("Most common birth country:", top_country)

Task 2: Decade with the Highest Ratio of US-born Winners

  • Objective: Identify the decade with the highest proportion of US-born Nobel Prize winners relative to total winners across all categories.
  • Deliverable: Store the decade as an integer:
    • max_decade_usa: The decade (e.g., 2000).
# Step 1: Create a column to flag US-born winners
df['us_born'] = df['birth_country'] == 'United States of America'

# Step 2: Create a decade column
df['decade'] = (df['year'] // 10) * 10

# Step 3: Calculate the ratio of US-born winners per decade
# Group by decade, calculate total winners and US-born winners
decade_stats = df.groupby('decade').agg(
    total_winners=('laureate_id', 'count'),
    us_born_winners=('us_born', 'sum')
).reset_index()

# Calculate the ratio
decade_stats['us_ratio'] = decade_stats['us_born_winners'] / decade_stats['total_winners']

# Step 4: Identify the decade with the highest ratio
max_decade_usa = int(decade_stats.loc[decade_stats['us_ratio'].idxmax(), 'decade'])

# Step 5: Create a relational line plot
sns.lineplot(x='decade', y='us_ratio', data=decade_stats, marker='o')
plt.title('Ratio of US-born Nobel Prize Winners by Decade')
plt.xlabel('Decade')
plt.ylabel('Ratio of US-born Winners')
plt.grid(True)
plt.show()

# Print the result
print("Decade with the highest ratio of US-born winners:", max_decade_usa)
# Create the US-born winners column
df['us_born'] = df['birth_country'] == 'United States of America'

# Create the decade column
df['decade'] = (np.floor(df['year'] / 10) * 10).astype(int)

# Calculate the ratio of US-born winners per decade
decade_stats = df.groupby('decade', as_index=False)['us_born'].mean().rename(columns={'us_born': 'us_ratio'})

# Identify the decade with the highest ratio
max_decade_usa = int(decade_stats.loc[decade_stats['us_ratio'].idxmax(), 'decade'])

# Create a relational line plot
sns.lineplot(x='decade', y='us_ratio', data=decade_stats, marker='o')
plt.title('Ratio of US-born Nobel Prize Winners by Decade')
plt.xlabel('Decade')
plt.ylabel('Ratio of US-born Winners')
plt.grid(True)
plt.show()

# Print result for Task 2
print("Decade with the highest ratio of US-born winners:", max_decade_usa)
df.head()

Task 3: Decade and Category with the Highest Proportion of Female Laureates

  • Objective: Find the decade and Nobel Prize category combination with the highest proportion of female laureates.
  • Deliverable: Store the result as a dictionary with a single key-value pair:
    • max_female_dict: Dictionary where the key is the decade (integer, e.g., 1990) and the value is the category (string, e.g., "Peace").
# Step 1: Create a column to flag female winners
df['female_winner'] = df['sex'] == 'Female'

# Step 2: Create the decade column
df['decade'] = (np.floor(df['year'] / 10) * 10).astype(int)

# Step 3: Calculate the proportion of female winners by decade and category
female_stats = df.groupby(['decade', 'category'], as_index=False)['female_winner'].mean().rename(columns={'female_winner': 'female_ratio'})

# Step 4: Find the decade and category with the highest proportion of female winners
max_row = female_stats.loc[female_stats['female_ratio'].idxmax()]
max_female_dict = {int(max_row['decade']): max_row['category']}

# Step 5: Create a relational line plot (optional)
sns.lineplot(x='decade', y='female_ratio', hue='category', data=female_stats, marker='o')
plt.title('Proportion of Female Nobel Prize Winners by Decade and Category')
plt.xlabel('Decade')
plt.ylabel('Proportion of Female Winners')
plt.grid(True)
plt.legend(title='Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Print the result
print("Decade and category with the highest proportion of female winners:", max_female_dict)

Task 4: First Female Nobel Prize Winner

  • Objective: Identify the first woman to receive a Nobel Prize and the category of her award.
  • Deliverable: Store the results as string variables:
    • first_woman_name: The full name of the first female winner (e.g., "Marie Curie").
    • first_woman_category: The category of the prize (e.g., "Physics").