Skip to content
install.packages("elo")
Hidden output

Load packages and set plotting options.

library(readr)
library(dplyr)
library(lubridate)
library(tidyr)
library(ggplot2)
library(elo)
options(repr.plot.width = 15, repr.plot.height = 8)
theme_set(theme_bw(24))
Hidden output

Import the dataset, keeping only games since 1950. Original data from https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017

fifa = read_csv(
    "results.csv", 
    col_types = cols(date = col_date("%Y-%m-%d"))
) %>% 
    filter(year(date) >= 1950)
fifa

The score is missing for a single game between Solomon Islands and Fiji, in which Fiji won 5-4.

fifa <- fifa %>% 
  replace_na(list(home_score = 4, away_score = 5))

Create a points column of 1 for the winner, 0.5 for a tie, and 0 for the loser.

fifa <- fifa %>%
  mutate(points = score(home_score, away_score))
fifa

Count the number of games by each team, and use this to filter for teams that have played at least 200 games.

count_home_games <- fifa %>% 
    rename(team = home_team) %>%
    count(team)
count_away_games <- fifa %>% 
    rename(team = away_team) %>%
    count(team)
count_games_gt200 <- bind_rows(count_home_games, count_away_games) %>%
    group_by(team) %>%
    summarize(n = sum(n)) %>%
    filter(n >= 200) %>%
    arrange(desc(n))
count_games_gt200
fifa <- fifa %>%
    filter(home_team %in% count_games_gt200$team | away_team %in% count_games_gt200$team)

Get the running Elo rating after each match.

model <- elo.run(points ~ home_team + away_team, data = fifa, k = 20)
summary(model)

Calculate the mean Elo Rating by team by year.