Visualizing the History of Nobel Prize Winners
The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.
The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the nobel.csv file in the data folder.
In this project, we will explore and answer several questions related to this prizewinning data...
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Setting seaborn style and context
sns.set_style('darkgrid')
sns.set_context('notebook')
# Importing data from csv
nobel = pd.read_csv('data/nobel.csv')
# Exploring data
print(nobel.info())Most commonly awarded gender
The below plot visualizes the distribution of Nobel Prize winners by gender. Here are some key observations:
1. Significant Gender Disparity:
- The number of male Nobel Prize winners is overwhelmingly higher than the number of female winners.
- The male bar is significantly taller, indicating a much larger count.
2. Imbalance in Recognition:
- This suggests that historically, men have been awarded Nobel Prizes at a far greater rate than women.
- This could be due to various factors, including historical gender biases, limited access to educational and research opportunities for women, and systemic barriers in many fields.
# Most commonly awarded gender
g = sns.catplot(x='sex', data=nobel, kind='count', hue='sex', legend=False)
g.fig.suptitle('Most commonly awarded gender', y = 1.02)
g.set(xlabel='Gender', ylabel='Nb of Nobel prize winners')
top_gender = 'Male'Most commonly awarded birth country
The following bar chart presents the top five birth countries of Nobel Prize winners. Here are some key observations:
1. Dominance of the United States:
- The USA has the highest number of Nobel Prize winners by birth country, significantly outpacing the other four nations.
- This reflects the country's strong academic institutions, research funding, and historical leadership in science, economics, and other fields recognized by the Nobel Prize.
2. European Presence:
- The remaining top birth countries—United Kingdom, Germany, France, and Sweden—are all European nations.
- This suggests that Nobel laureates have historically come from regions with well-established academic and scientific traditions.
3. Sweden's Presence:
- Sweden, being the home country of Alfred Nobel and the Nobel Prize institution, appears in the top five, which is notable given its relatively small population compared to the other nations on the list.
4. Historical Context:
- Germany and the United Kingdom have long histories of scientific and intellectual contributions, but their numbers are notably lower than the USA.
- This shift may be due to factors such as post-World War II brain drain, where many European scientists moved to the United States.
Overall Interpretation
- This chart highlights how geopolitical and economic factors influence scientific and academic achievements.
# Most commonly awarded birth country
top_5_countries = nobel['birth_country'].value_counts().nlargest(5).index
nobel_top_countries = nobel[nobel['birth_country'].isin(top_5_countries)]
g = sns.catplot(y='birth_country', data=nobel_top_countries, kind='count', order=top_5_countries, hue='birth_country', legend=False)
g.fig.suptitle('Most commonly awarded birth country (top 5)', y=1.02)
g.set(xlabel='', ylabel='')
top_country = 'United States of America'Decade with the highest ratio of US-born Nobel Prize winners
This line plot shows the proportion of Nobel laureates born in the United States by decade. Here are some key observations:
1. Early Growth (1900–1940s):
- The percentage of US-born laureates was very low in the early 1900s but started increasing significantly in the 1920s and 1930s.
- This growth could be attributed to the expansion of American universities, research institutions, and increased investment in science and innovation.
2. Post-WWII Peak (1940s–1960s):
- The proportion of US-born laureates saw a sharp increase after World War II, reaching around 30% in the 1950s.
- This period marked the United States’ rise as a global scientific and technological leader, benefiting from the influx of European scientists fleeing the war (e.g., the "brain drain").
3. Stabilization and Growth (1970s–1990s):
- The proportion remained steady but saw another sharp increase in the 1980s and 1990s, reaching its peak at over 40%.
- This reflects the continued dominance of American research institutions, strong federal funding (e.g., NASA, NIH, NSF), and breakthroughs in various fields.
4. Recent Decline (2000s–2010s):
- The proportion of US-born laureates has declined from its peak, although it remains high.
- This could be due to the increasing global competition in science, the rise of research powerhouses like China and the European Union, and a more diverse distribution of Nobel winners.
Overall Interpretation
- The data suggests that the US became a dominant force in Nobel Prize achievements in the mid-20th century, largely due to scientific migration, investment in research, and institutional development.
- However, recent decades have seen a more globalized landscape of scientific excellence.
# Decade with the highest ratio of US-born Nobel Prize winners to total winners in all categories
nobel['us_born_winner'] = nobel['birth_country']=='United States of America'
nobel['decade'] = (nobel['year'] // 10) * 10
us_winners_by_decade = nobel[nobel['us_born_winner']].groupby('decade').size()
total_winners_by_decade = nobel.groupby('decade').size()
us_ratio = us_winners_by_decade / total_winners_by_decade
us_ratio_p = us_ratio*100
g = sns.relplot(us_ratio_p, kind = 'line', marker = 'o')
g.fig.suptitle('Proportion of US laureates by decade', y=1.02)
g.set(xlabel = 'Decade', ylabel = 'Ratio of US-born to total winners (%)')
max_decade_usa = 2000Proportion of female Nobel Prize laureates
This plot presents the proportion of female Nobel laureates by decade across different prize categories. Here are the key observations:
1. Increasing Female Representation
- The percentage of female laureates has generally increased over time, with a significant rise in recent decades (2000s and 2010s).
- The most notable spike occurs in the 2020s, where multiple categories show their highest proportions of female winners.
2. Differences Across Nobel Categories
- Literature and Peace prizes have consistently had a higher proportion of female winners compared to scientific fields.
- This could be attributed to fewer historical barriers for women in these domains.
- Medicine and Chemistry have shown gradual increases, particularly after the 1980s.
- Physics and Economics remain the least gender-inclusive categories, with very few female laureates throughout history.
3. Sharp Increase in Recent Years (2020s)
- The spike in Chemistry, Peace, and Literature in the 2020s suggests increased recognition of female contributions.
- The rise might be attributed to growing gender inclusivity efforts and policy changes in academia and research institutions.
Overall Interpretation
- While historical barriers led to the underrepresentation of women in Nobel Prizes, there has been a noticeable shift, especially in recent decades.
- However, scientific fields like Physics and Economics still have much lower female representation compared to others.
# Decade and Nobel Prize category combination with the highest proportion of female laureates
fem_win_by_decade_and_cat = nobel[nobel['sex']=='Female'].groupby(['decade','category']).size()
tot_win_by_decade_and_cat = nobel.groupby(['decade','category']).size()
fem_ratio_df = (fem_win_by_decade_and_cat / tot_win_by_decade_and_cat).to_frame(name="ratio").reset_index()
# Find the decade and category combination with the highest proportion of female laureates
max_female_row = fem_ratio_df.loc[fem_ratio_df['ratio'].idxmax()]
max_female_dict = {max_female_row['decade']: max_female_row['category']}
# Data visualization
fem_ratio_df['ratio_%'] = fem_ratio_df['ratio']*100
g = sns.relplot(x='decade', y='ratio_%', data = fem_ratio_df, kind = 'line', hue = 'category', marker = 'o', alpha=0.8)
g.fig.suptitle('Proportion of female laureates by decade and category', y=1.02)
g.set(ylabel = '% female laureates', xlabel = 'Decade')
g._legend.set_title('Nobel category')# First woman to receive a Nobel Prize
nobel_fem_sorted = nobel[nobel['sex']=='Female'].sort_values('year').reset_index()
first_woman_name = nobel_fem_sorted['full_name'][0]
first_woman_category = nobel_fem_sorted['category'][0]
print('Full Name: ',first_woman_name,'\nCategory: ', first_woman_category)Multiple Nobel Prize winners
This horizontal bar chart illustrates individuals and organizations that have won multiple Nobel Prizes. Here are the key insights:
1. Most Decorated Nobel Laureates
- The International Committee of the Red Cross holds the highest number of Nobel Prizes, having won three times. This is due to its humanitarian efforts and contributions to peace.
- The Office of the United Nations High Commissioner for Refugees (UNHCR) has won twice, recognizing its work in aiding displaced populations.
2. Notable Individual Winners
- Several individuals have received the Nobel Prize twice, highlighting their exceptional contributions across multiple disciplines:
- Marie Curie: The only person to win Nobel Prizes in two different scientific fields—Physics (1903) and Chemistry (1911).
- Linus Pauling: A rare case of winning in different categories—Chemistry (1954) and Peace (1962).
- John Bardeen: The only person to win the Nobel Prize in Physics twice (1956 and 1972) for groundbreaking work in semiconductors and superconductivity.
- Frederick Sanger: Recognized twice in Chemistry (1958 and 1980) for his work in protein and DNA sequencing.
Overall Interpretation
- Winning a Nobel Prize is an extraordinary achievement, but these individuals and organizations stand out by earning the recognition multiple times.
- The list includes both scientific pioneers and humanitarian organizations, reflecting the diversity of the Nobel Prize’s impact.
- Scientific fields like Physics and Chemistry dominate among individual winners, whereas Peace is more common among organizations.
# Individuals or organizations having won more than one Nobel Prize throughout the years
nb_win_by_name = nobel.groupby('full_name').size().to_frame().rename(columns={0:'nb'}).reset_index()
repeat = nb_win_by_name[nb_win_by_name['nb']>1]
repeat_list = repeat['full_name'].to_list()
# Data visualization
multi_nobel = nobel[nobel['full_name'].isin(repeat['full_name'])]
order = multi_nobel['full_name'].value_counts().index
g = sns.catplot(y='full_name', data=multi_nobel, order=order, kind='count')
g.fig.suptitle('Nb of Nobel Prize by person/organization', y = 1.02)
g.set(xlabel='', ylabel='')
# Ensure x-axis only displays integers
g.ax.xaxis.get_major_locator().set_params(integer=True)