Skip to content

The Super Bowl is a spectacle. It is the final game in the NFL that crowns the winner of that season. There is always a little something for everyone. For the fans, there is the game itself. For those tagging along, there are the unique advertisements and the halftime shows with the biggest musicians in the world.

You're going to explore how these elements interact.

The Data

The data has been scraped from Wikipedia and is made up of two CSV files covering a span of all Super Bowls up to 2024. This data does contain missing values. The most relevant columns are noted below.
data/tv.csv

ColumnDescription
super_bowlSuper Bowl number (e.g. the first Super Bowl ever is Super Bowl 1 and the last Super Bowl in 2024 is Super Bowl 58)
avg_us_viewersAverage # of US viewers
share_householdPercentage of households watching TV that watched the game
rating_householdPercentage of all households with TVs that watched the game
ad_costCost per ad

data/super_bowls.csv

ColumnDescription
super_bowlSuper Bowl number (e.g. the first Super Bowl ever is Super Bowl 1 and the last Super Bowl in 2024 is Super Bowl 58)
difference_ptsPoint difference for that game
# Load packages
library(tidyverse)

# Load the CSV data
tv  <-  read_csv("data/tv.csv", show_col_types=FALSE)
super_bowls  <-  read_csv("data/super_bowls.csv", show_col_types=FALSE)

#1 Do large point differences result in lost viewers across super bowl games?

ggplot(super_bowls, aes(difference_pts)) +
	geom_histogram(binwidth = 2) +
	labs(x = "Point Difference", y = "Number of Super Bowls")

super_bowls  %>% 
 filter(difference_pts == min(difference_pts) | difference_pts == max(difference_pts))

games_tv <- tv  %>% inner_join(super_bowls, by = "super_bowl")

ggplot(games_tv, aes(difference_pts, share_household)) +
 geom_point() +
 geom_smooth(method = "lm") +
 labs(x = "Point Difference", y = "Viewership (household share)")

score_impact = "weak"

#2 How has the number of viewers and TV ratings trended alongside advertisement costs?
games_tv_plot_avg_us_viewers <- games_tv %>%
    select(super_bowl, avg_us_viewers) %>%
    mutate(category = "Average number of US viewers", value = avg_us_viewers) %>%
    select(super_bowl, category, value)

games_tv_plot_rating_household <- games_tv %>%
	select(super_bowl, rating_household) %>%
	mutate(category = "Household rating", value = rating_household) %>%
	select(super_bowl, category, value)

games_tv_plot_ad_cost <- games_tv %>%
	select(super_bowl, ad_cost) %>%
	mutate(category = "Advertisement cost (USD)", value = ad_cost) %>%
	select(super_bowl, category, value)

games_tv_plot <- bind_rows(games_tv_plot_avg_us_viewers,
						   games_tv_plot_rating_household,
						   games_tv_plot_ad_cost)

# Plot the data
ggplot(games_tv_plot) +
	geom_line(data = games_tv_plot %>% filter(category == "Average number of US viewers"),
		aes(x = super_bowl, y = value / max(value), color = "Average number of US viewers")) +
	geom_line(data = games_tv_plot %>% filter(category == "Household rating"),
		aes(x = super_bowl, y = value / max(value), color = "Household rating")) +
	geom_line(data = games_tv_plot %>% filter(category == "Advertisement cost (USD)"),
		aes(x = super_bowl, y = value / max(value), color = "Advertisement cost (USD)")) +
	labs(x = "Super Bowl", y = "Scaled Value (0-1)", color = "category")

first_to_increase = "ratings"