Project
Evolution of Formula 1 Racing over the Years
Source
Vopani's "Formula 1 World Championship" Dataset
Objective
Analyze F1 data between 1950 and 2023 and determine changes in average lap time, pit stop time, incident rate, and constructor performance.
1. Prepare data for lap time analysis by querying the circuit name, year, and time for every recorded lap
SELECT name AS track, year, milliseconds as lap_time
FROM
(
SELECT circuitId, year, milliseconds
FROM lap_times
LEFT JOIN races
USING(raceId)
)
LEFT JOIN circuits
USING(circuitId)
2. Visualize lap time data by track since 1996, when lap times were first recorded in the database (only using tracks that hosted a race in 75% or more of seasons since 1996)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Drop NA rows (they are annotated as '\N' in original data)
laptime['lap_time'] = laptime['lap_time'].replace('\\N', np.nan).astype(float)
laptime = laptime.dropna(subset=['lap_time'])
laptime = laptime[laptime['lap_time'] > 0]
#drop outliers as many are present due to weather conditions during some races
laptime['upper'] = laptime.groupby('track')['lap_time'].transform(lambda x: ( x.quantile(.75) + (1.5 * (x.quantile(.75)-x.quantile(.25)))))
laptime['lower'] = laptime.groupby('track')['lap_time'].transform(lambda x: ( x.quantile(.25) - (1.5 * (x.quantile(.75)-x.quantile(.25)))))
laptime = laptime[(laptime['lap_time'] > laptime['lower']) & (laptime['lap_time'] < laptime['upper'])]
#create dataframe that only includes tracks that were raced on in 75% of seasons since 1996 for by-track analysis
required_years = range(1996,2024)
track_years = laptime.groupby('track')['year'].nunique()
tracks_to_keep = track_years[track_years >= (len(required_years) * .75)].index.tolist()
laptime_bytrack = laptime[laptime['track'].isin(tracks_to_keep)]
#visualize data
bytrack_plot = sns.lineplot(x='year', y='lap_time', hue='track', errorbar=None, data=laptime_bytrack, palette='tab10')
bytrack_plot.set_title('Average F1 Lap Times by Track 1996-2023')
bytrack_plot.set_xlabel('Year')
bytrack_plot.set_ylabel('Lap Time')
ylabels = ['{:,.0f}'.format(y) + ' sec.' for y in bytrack_plot.get_yticks()/1000]
bytrack_plot.set_yticklabels(ylabels)
bytrack_plot.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.gca().invert_yaxis()
plt.show()
Observation:
You can notice some shared trends amongst the circuits, such as an increase in lap speed in the mid-2000s and a drop in lap speed in he mid-2010s.
3. Analyze overall change in average lap time, using a metric that takes each individual lap time in respect to the all-time average lap time of the track. This eliminates the influence of track length on our recorded changes in average lap time.
#create new table for grand average analysis by year
laptime_grandaverage = laptime
#create new column that returns each lap time's differential from the respective track's all-time average
laptime_grandaverage['track_avg'] = laptime_grandaverage.groupby('track')['lap_time'].transform(lambda x: x.mean())
laptime_grandaverage['delta'] = laptime_grandaverage['lap_time'] - laptime_grandaverage['track_avg'] - 2900 #this is so the graph starts at zero on the y-axis
#visualize data
grandaverage_plot = sns.lineplot(x='year', y='delta', data=laptime_grandaverage, color='red', errorbar=None)
grandaverage_plot.set_title('Change in Average F1 Lap Time 1996-2023')
grandaverage_plot.set_xlabel('Year')
grandaverage_plot.set_ylabel('Change in Average Lap Time Since 1996')
ylabels = ['{:,.0f}'.format(y) + ' sec.' for y in grandaverage_plot.get_yticks()/1000]
grandaverage_plot.set_yticklabels(ylabels)
plt.gca().invert_yaxis()
plt.show()
Observation
F1 lap speeds peaked in the mid-2000s, average 7 seconds shorter than in 1996. We have not seen those average lap speeds since then, likely due to modifications in the tracks, cars, and regulations. Lap speeds plummetted in the early-mid-2010s and have fluctuated since then, but appear to be back on the rise. Overall, this visualization affirms the observations we made when comparing lap speed by each track.
4. Prepare data for pit stop analysis over time by querying each pit stop's year and time
SELECT year, milliseconds AS time
FROM pit_stops
LEFT JOIN races
USING(raceId);
5. Visualize average pit stop time since 2011, when pit stop time was first recorded
#drop outliers as many are present
pitstops['upper'] = pitstops.groupby('year')['time'].transform(lambda x: ( x.quantile(.75) + (1.5 * (x.quantile(.75)-x.quantile(.25)))))
pitstops['lower'] = pitstops.groupby('year')['time'].transform(lambda x: ( x.quantile(.25) - (1.5 * (x.quantile(.75)-x.quantile(.25)))))
pitstops = pitstops[(pitstops['time'] > pitstops['lower']) & (pitstops['time'] < pitstops['upper'])]
pitstop_plot = sns.lineplot(data=pitstops, x='year', y='time', color='red', errorbar=None)
pitstop_plot.set_title('Average F1 Pit Stop Time 2011-2023')
pitstop_plot.set_xlabel('Year')
pitstop_plot.set_ylabel('Average Pit Stop Time')
ylabels = ['{:,.1f}'.format(y) + ' sec.' for y in pitstop_plot.get_yticks()/1000]
pitstop_plot.set_yticklabels(ylabels)
plt.gca().invert_yaxis()
plt.show()
Observation
Although speeding back up in the last 3 years, pit stop peed has generally been on a slight decline since 2011. This may be shocking since tire changes are getting notably faster by the year, but changes in safety procedures can lead to more time spent in the pit lane.
6. Retreive the year and whether the race ended in a mechanical failure or accident (if any) for every participant of every race in F1 history.