Skip to content

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API.

In this project, I explored and answered several questions related to this prize winning data.

# Loading in required libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# importing CSV

nobel = pd.read_csv('data/nobel.csv')
nobel.head()

⭐ Details of the dataset

nobel.shape

This dataset has 1000 rows and 18 columns. Next, I'd check for data type inconsistencies as well as presence of nulls

nobel.info()

I noticed 10 out of 18 columns with null values. The birth_date and death_date column are stored as object data type which needs to be changed into date-time format for analysis.

Since there are NaN values i'll use the errors=coerce argument in pd.to_datetime() function to convert the column into datetime format

# converting to date time format 
nobel['birth_date'] = pd.to_datetime(nobel['birth_date'], errors='coerce')

repeating the same process for death_date column

nobel['death_date'] = pd.to_datetime(nobel['death_date'], errors='coerce')
nobel.info()

year column is in integer format, most of the insights in this project will be in relation to a decade, lets make a column to denote which decade a particular year belonged to

# creating decade column

nobel['decade'] = (nobel['year'] // 10 ) * 10

nobel.head()