Max Verstappen Grid vs Race Result

Max Verstappen: Grid vs Race Result

So far, Max Verstappen has demonstrated that he is from another planet and that he wants to consolidate as one of the greatest (if not the best) driver in Formula 1.

One way to evaluate a Formula 1 driver performance during a Grand Prix is by counting how many positions he gained/lost regarding his starting position. We will do just that for all of Max Verstappen results.

# Load required libraries
library(tidyverse)
library(jsonlite)

if(!require(cluster)){
    install.packages("cluster")
    library(cluster)
}

Obtaining the data

We will be using the Ergast Developer API motor racing database, which provides historical records of motor racing data.

We will make an API call (more info on Ergast API Documentation) to obtain a JSON file
Convert the obtained JSON file into a data frame
Tidy the data and select only the one that is of our interest

# URL to obtain all Max Verstappen's results
url <- "http://ergast.com/api/f1/drivers/max_verstappen/results.json?limit=156"

# The data is in json format
data_json <- fromJSON(url)

# The info we need is in this list of dataframes
data_list <- data_json$MRData$RaceTable$Races$Results

# Converting it to df
data <- bind_rows(data_list)
glimpse(data)

Tidying the data

This data frame contains a lot of interesting information, however for this analysis we will just need the grid(race start position), position (race result position) and positionText columns. This last column is important as we can later filter to see if Max did not finished a race (DNF), which can explain some outliers in the data.

# For the purposes of this analysis we are just going to select the following columns
data <- data %>%
	select(grid, position, positionText)

glimpse(data)

Notice that the type of the grid, and position columns is character, we need to change them to numeric in order to work with them.

# Changing grid, position columns to numeric
data <- data %>%
	mutate(across(c(grid, position), as.numeric))
glimpse(data)

Analyzing the data

Now we can start doing some analysis! Let's plot grid vs position and see how well Max does.

# Basic grid vs position jitter plot
ggplot(data, aes(x = position, y = grid)) +
	geom_jitter()

All or nothing

The first thing to observe is that most of the data points are grouped in the top 5 for grid and position, this certainly tell us how good Max is. On the other hand we can observe another group of points to the right of the plot that tell us that sometimes Max finishes at the back of the pack, as mentioned before, looking at the positionText column we might observe if these results are DNFs or if Max really finish a race at the back.

This is why the positionText column is useful, it is not only the position column in character type, it also use some labels to explain the result of a race like being disqualified, retired or failed to qualify. More info on this labels here.

While very difficult to distiniguish, if you are good observateur, there is one point where it seems that Max started from a 0 grid position, what does this means? Looking at the databes notes we can see that a 0 grid start means that he started from the pitlane, for now it might be easier to just filter that point.

Helping us with the positionText column, we'll create a column which adds wether Max finished or not a race.

# Check how many pitlane starts are (grid == 0)
length(which(data$grid == 0))

# Check unique values for positionText
unique(data$positionText)

# Add a DNF column to our data
data <- data %>%
	mutate(dnf = ifelse(positionText == "R", TRUE, FALSE))

As we predicted, there is one pitlane start. When checking for unique positionText labels, we can see that only "R" (retired) is not a digit so we can easily filter so that every positionText that is "R" is considered a DNF.

Lets edit our plot to change the shape of the points if Max finished or not the race.

# Plot the shape to distinguish between finished and dnf races
ggplot(data, aes(x = position, y = grid, shape = dnf)) +
	geom_jitter()