π Formula 1 Race Analysis β SQL Capstone Project
π Project Overview
Title: Formula 1 Grand Prix Race Insights using SQL
Objective: To analyze historical F1 race data to uncover patterns, driver & constructor performance, and race characteristics using structured SQL queries. Data Source: CSV file with the following columns: index, grandprix, date, winner, car, laps, time, namecode. Tools Used: PostgreSQL / MySQL / SQLite, CSV, SQL Scope of Analysis:
Performance of drivers and constructors
Race frequency and trends
Laps and race duration analysis
SELECT * FROM 'winners.csv';CREATE TABLE race_results (
id INT PRIMARY KEY,
grandprix VARCHAR(100),
race_date DATE,
winner VARCHAR(100),
car VARCHAR(100),
laps INT,
race_time VARCHAR(50),
namecode VARCHAR(10)
);
--Most successful driver by wins
SELECT winner, COUNT(*) AS wins
FROM winners.csv
GROUP BY winner
ORDER BY wins DESC;
--Top winning car brands
SELECT car, COUNT(*) AS total_wins
FROM winners.csv
GROUP BY car
ORDER BY total_wins DESC;
--Most races in a single year
SELECT EXTRACT(YEAR FROM "Date") AS year, COUNT(*) AS races
FROM 'winners.csv'
GROUP BY year
ORDER BY races DESC;--Average number of laps per race and Driver-car combination with most wins
SELECT winner, car, COUNT(*) AS wins, ROUND(AVG(laps), 2) AS avg_laps
FROM winners.csv
GROUP BY winner, car
ORDER BY wins DESC;
--Which driver has won races across the most different car brands?
SELECT winner, COUNT(DISTINCT car) AS unique_cars
FROM winners.csv
GROUP BY winner
ORDER BY unique_cars DESC;
--Find the longest winning streak by any driver (consecutive races by date)
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY winner ORDER BY date) -
ROW_NUMBER() OVER (ORDER BY date) AS streak_group
FROM winners.csv
)
SELECT winner, COUNT(*) AS streak_length
FROM ranked
GROUP BY winner, streak_group
ORDER BY streak_length DESC
LIMIT 1;
--Which car brand dominated a specific decade (e.g., 2010s)?
SELECT car, COUNT(*) AS wins_in_2010s
FROM winners.csv
WHERE EXTRACT(YEAR FROM DATE) BETWEEN 2010 AND 2019
GROUP BY car
ORDER BY wins_in_2010s DESC;--Driver with fastest average race time (excluding NULL or non-numeric times)
SELECT winner, AVG(
SPLIT_PART(Time, ':', 1)::INT * 3600 +
SPLIT_PART(Time, ':', 2)::INT * 60 +
SPLIT_PART(Time, ':', 3)::DOUBLE PRECISION
) AS avg_seconds
FROM winners.csv
WHERE Time IS NOT NULL AND Time != '' AND SPLIT_PART(Time, ':', 3) != ''
GROUP BY winner
ORDER BY avg_seconds ASC;-- Year-over-year win progression for the most successful driver
WITH top_driver AS (
SELECT winner
FROM "winners.csv"
GROUP BY winner
ORDER BY COUNT(*) DESC
LIMIT 1
),
yearly_wins AS (
SELECT
winner,
car,
EXTRACT(YEAR FROM date) AS year,
COUNT(*) AS wins
FROM "winners.csv"
WHERE winner = (SELECT winner FROM top_driver)
GROUP BY winner, car, year
)
SELECT
winner,
year,
STRING_AGG(DISTINCT car, ', ') AS car_brands,
SUM(wins) AS total_wins
FROM yearly_wins
GROUP BY winner, year
ORDER BY year;
π Conclusion
This project provided several key insights from the Formula 1 race dataset through structured SQL analysis:
π Most Successful Driver: Lewis Hamilton emerged as the most successful driver with a total of 103 race wins.
ποΈ Top Constructor by Wins: Ferrari leads all constructors with 245 total wins.
π Busiest Race Years: The years 2021 to 2023 each hosted the highest number of races, with 22 races per year.
βοΈ Dominant Driver-Car Duo: Lewis Hamilton achieved 82 wins driving for Mercedes, with an average of 59.34 laps per race during those victories.
π Most Versatile Drivers by Car Brand: Stirling Moss and Alain Prost demonstrated remarkable adaptability, each securing wins with 5 different car brands.
π₯ Longest Winning Streak: Max Verstappen holds the longest consecutive win streak, with 10 straight victories.
πͺ Constructor Dominance by Decade: Mercedes dominated the 2010s decade, securing the most wins in that period.
π Highest Average Race Time: Peter Gethin recorded the highest average race time, averaging 4692.6 seconds per race.