Skip to content

🏁 Formula 1 Race Analysis – SQL Capstone Project

πŸ“˜ Project Overview

Title: Formula 1 Grand Prix Race Insights using SQL

Objective: To analyze historical F1 race data to uncover patterns, driver & constructor performance, and race characteristics using structured SQL queries. Data Source: CSV file with the following columns: index, grandprix, date, winner, car, laps, time, namecode. Tools Used: PostgreSQL / MySQL / SQLite, CSV, SQL Scope of Analysis:

Performance of drivers and constructors

Race frequency and trends

Laps and race duration analysis

Spinner
DataFrameas
df
variable
SELECT * FROM 'winners.csv';
Spinner
DataFrameas
df1
variable
CREATE TABLE race_results (
    id INT PRIMARY KEY,
    grandprix VARCHAR(100),
    race_date DATE,
    winner VARCHAR(100),
    car VARCHAR(100),
    laps INT,
    race_time VARCHAR(50),
    namecode VARCHAR(10)
);
Spinner
DataFrameas
df2
variable
--Most successful driver by wins
SELECT winner, COUNT(*) AS wins
FROM winners.csv
GROUP BY winner
ORDER BY wins DESC;
Spinner
DataFrameas
df3
variable
--Top winning car brands
SELECT car, COUNT(*) AS total_wins
FROM winners.csv
GROUP BY car
ORDER BY total_wins DESC;
Spinner
DataFrameas
df4
variable
--Most races in a single year
SELECT EXTRACT(YEAR FROM "Date") AS year, COUNT(*) AS races
FROM 'winners.csv'
GROUP BY year
ORDER BY races DESC;
Spinner
DataFrameas
df5
variable
--Average number of laps per race and Driver-car combination with most wins
SELECT winner, car, COUNT(*) AS wins, ROUND(AVG(laps), 2) AS avg_laps
FROM winners.csv
GROUP BY winner, car
ORDER BY wins DESC;

Spinner
DataFrameas
df6
variable
--Which driver has won races across the most different car brands?
SELECT winner, COUNT(DISTINCT car) AS unique_cars
FROM winners.csv
GROUP BY winner
ORDER BY unique_cars DESC;
Spinner
DataFrameas
df7
variable
--Find the longest winning streak by any driver (consecutive races by date)
WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY winner ORDER BY date) -
           ROW_NUMBER() OVER (ORDER BY date) AS streak_group
    FROM winners.csv
)
SELECT winner, COUNT(*) AS streak_length
FROM ranked
GROUP BY winner, streak_group
ORDER BY streak_length DESC
LIMIT 1;
Spinner
DataFrameas
df8
variable
--Which car brand dominated a specific decade (e.g., 2010s)?
SELECT car, COUNT(*) AS wins_in_2010s
FROM winners.csv
WHERE EXTRACT(YEAR FROM DATE) BETWEEN 2010 AND 2019
GROUP BY car
ORDER BY wins_in_2010s DESC;
Spinner
DataFrameas
df9
variable
--Driver with fastest average race time (excluding NULL or non-numeric times)
SELECT winner, AVG(
    SPLIT_PART(Time, ':', 1)::INT * 3600 +
    SPLIT_PART(Time, ':', 2)::INT * 60 +
    SPLIT_PART(Time, ':', 3)::DOUBLE PRECISION
) AS avg_seconds
FROM winners.csv
WHERE Time IS NOT NULL AND Time != '' AND SPLIT_PART(Time, ':', 3) != ''
GROUP BY winner
ORDER BY avg_seconds ASC;
Spinner
DataFrameas
df10
variable
-- Year-over-year win progression for the most successful driver
WITH top_driver AS (
    SELECT winner
    FROM "winners.csv"
    GROUP BY winner
    ORDER BY COUNT(*) DESC
    LIMIT 1
),
yearly_wins AS (
    SELECT 
        winner,
        car,
        EXTRACT(YEAR FROM date) AS year,
        COUNT(*) AS wins
    FROM "winners.csv"
    WHERE winner = (SELECT winner FROM top_driver)
    GROUP BY winner, car, year
)
SELECT 
    winner,
    year,
    STRING_AGG(DISTINCT car, ', ') AS car_brands,
    SUM(wins) AS total_wins
FROM yearly_wins
GROUP BY winner, year
ORDER BY year;

🏁 Conclusion

This project provided several key insights from the Formula 1 race dataset through structured SQL analysis:

πŸ† Most Successful Driver: Lewis Hamilton emerged as the most successful driver with a total of 103 race wins.

🏎️ Top Constructor by Wins: Ferrari leads all constructors with 245 total wins.

πŸ“… Busiest Race Years: The years 2021 to 2023 each hosted the highest number of races, with 22 races per year.

βš™οΈ Dominant Driver-Car Duo: Lewis Hamilton achieved 82 wins driving for Mercedes, with an average of 59.34 laps per race during those victories.

πŸ”„ Most Versatile Drivers by Car Brand: Stirling Moss and Alain Prost demonstrated remarkable adaptability, each securing wins with 5 different car brands.

πŸ”₯ Longest Winning Streak: Max Verstappen holds the longest consecutive win streak, with 10 straight victories.

πŸ’ͺ Constructor Dominance by Decade: Mercedes dominated the 2010s decade, securing the most wins in that period.

πŸ•’ Highest Average Race Time: Peter Gethin recorded the highest average race time, averaging 4692.6 seconds per race.