Skip to content
A Visual History of Nobel Prize Winners
  • AI Chat
  • Code
  • Report
  • 1. The most Nobel of Prizes

    The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016. With data exploration, manipulation and visualizations I am to answer some questions like "How do male and female compare in winning the nobel prize", and "What is the average age of nobel prize awardees, and is the average age trending upwards or downards?". Some columns of interest that we can use to answer our questions are the 'year', 'sex', and 'birth_date' columns!

    # Loading in required libraries
    import pandas as pd
    import seaborn as sns
    import numpy as np
    
    # Reading in the Nobel Prize data
    nobel = pd.read_csv('datasets/nobel.csv')
    
    # Taking a look at the first several winners
    nobel.head(n=6)

    2. Which country

    Just looking at the first couple of prize winners, or Nobel laureates as they are also called, we already see a celebrity: Wilhelm Conrad Röntgen, the guy who discovered X-rays. And actually, we see that all of the winners in 1901 were guys that came from Europe. But that was back in 1901, looking at all winners in the dataset, from 1901 to 2016, which sex and which country is the most commonly represented?

    (For country, we will use the birth_country of the winner, as the organization_country is NaN for all shared Nobel Prizes.)

    # Display the number of (possibly shared) Nobel Prizes handed
    # out between 1901 and 2016
    display(len(nobel))
    
    # Display the number of prizes won by male and female recipients.
    # value_counts() takes a column of a df object and counts each distinct value.
    display(nobel['sex'].value_counts())
    
    # Display the number of prizes won by the top 10 nationalities.
    # value_counts() takes a column of a df object and counts each distinct value.
    display((nobel['birth_country'].value_counts()).head(n=10))
    
    # In order to only show 10 countries in our countplot, we create a new df showing top 10 countries
    top_10 = nobel['birth_country'].value_counts().head(10).index.tolist()
    
    # Plot, using order=top_10 to plot ONLY the top 10 nobel prize producing countries.
    sns.countplot(data = nobel, y = 'birth_country', order=top_10)

    3. Has the USA always produced Nobel Prize Winners?

    We can see that the USA is by far the greatest country in producing Nobel Prize winners. But has this always been the case? We can calculate the proportion of USA born winners by decade to find out!

    Making a new column of 'usa born winner', we can see that the USA started dominating the Nobel Prize at around the 1930s, where 25% of Nobel Prize winners were American born.

    # Calculating the proportion of USA born winners per decade
    nobel['usa_born_winner'] = (nobel['birth_country'] == 'United States of America')
    nobel['decade'] = (np.floor(nobel['year'] / 10) * 10).astype(int)
    prop_usa_winners = nobel.groupby('decade', as_index=False)['usa_born_winner'].mean()
    
    # Display the proportions of USA born winners per decade
    display(prop_usa_winners)

    ## (Invalid URL) 4. Has the USA always produced Nobel Prize Winners? (Visualized)

    Plotting the proporiton of American born Nobel Prize winners, we can see that there was a definite upward trend for the proportion of American Born Nobel Prize winners, and that the general dominance started in the 1930s. We can see that the 2010s was a down year, in terms of the proportion of Nobel Prize winners being American born.

    # Setting the plotting theme
    sns.set()
    # and setting the size of all plots.
    import matplotlib.pyplot as plt
    plt.rcParams['figure.figsize'] = [11, 7]
    
    # Plotting USA born winners
    ax = sns.lineplot(x='decade', y='usa_born_winner', data=prop_usa_winners)
    
    # Adding %-formatting to the y-axis
    from matplotlib.ticker import PercentFormatter
    ax.yaxis.set_major_formatter(PercentFormatter(1.0))

    5. How have women perform in achieving the nobel prize over the decades?

    Another interesting column we can explore is the 'Sex' column. We can visualize the difference in nobel prize achievements between men and women using this column and the groupby() function. We can also check which fields women perform better in by separating by nobel prize categories.

    After plotting the proportion of female winners by decade and category, we can see that women seem to be achieving nobel prizes at an increasing rate over the decades, but no category has had an over 50% nobel prize achievement in a single decade. We can also see that women perform the best in two nobel prize categories : Literature and Peace.

    # Calculating the proportion of female laureates per decade
    nobel['female_winner'] = nobel['sex'] == 'Female'
    
    prop_female_winners = nobel.groupby(['decade', 'category'], as_index=False)['female_winner'].mean()
    
    # Plotting USA born winners with % winners on the y-axis
    ax = sns.barplot(x='decade', y='female_winner', data=prop_female_winners, hue = 'category')
    

    6. The first woman to win the Nobel Prize

    We can see that women have won the nobel prize since its inception, in the 1900s. We can explore who the first ever woman to win the nobel prize is, by using the nsmallest() method.

    # Picking out the first woman to win a Nobel Prize
    # do not print so as to return a dataframe object
    nobel[nobel.sex == 'Female'].nsmallest(1, 'year')
    

    7. Repeat laureates

    Another interesting question we can explore is how many repeat winners are there for nobel prizes? We can group by each distinct individual and filter for when the group is greater than 1.

    With this, we can see that as of 2016, there are a total of 13 repeat Nobel Prize winners.

    Some notable names are : Marie Curie and the United Nations (Group)

    # Selecting the laureates that have received 2 or more prizes.
    # We can group by distinct persons, and then filter for more than 2 winners.
    nobel.groupby('full_name').filter(lambda group: len(group) > 1)

    8. How old are you when you get the prize?

    Another interesting question we can solve is how are the nobel prize winners at the time of recieving the award? We can also plot by year to see if the average age of awardees is trending positive or trending negative.

    According to our plot, we can see that the age of nobel prize awardees is trending upwards, and the average age in the 1900s was about 52. The average age of nobel prize winners nowadays is about 65.

    We also see that the density of points is much high nowadays than in the early 1900s -- nowadays many more of the prizes are shared, and so there are many more winners. We also see that there was a disruption in awarded prizes around the Second World War (1939 - 1945).