Skip to content
Project: Analyzing Super Bowl Viewership and Advertising
The Super Bowl is a spectacle. It is the final game in the NFL that crowns the winner of that season. There is always a little something for everyone. For the fans, there is the game itself. For those tagging along, there are the unique advertisements and the halftime shows with the biggest musicians in the world.
You're going to explore how these elements interact.
The Data
The data has been scraped from Wikipedia and is made up of two CSV files covering a span of all Super Bowls up to 2024. This data does contain missing values. The most relevant columns are noted below.
data/tv.csv
| Column | Description |
|---|---|
super_bowl | Super Bowl number (e.g. the first Super Bowl ever is Super Bowl 1 and the last Super Bowl in 2024 is Super Bowl 58) |
avg_us_viewers | Average # of US viewers |
share_household | Percentage of households watching TV that watched the game |
rating_household | Percentage of all households with TVs that watched the game |
ad_cost | Cost per ad |
data/super_bowls.csv
| Column | Description |
|---|---|
super_bowl | Super Bowl number (e.g. the first Super Bowl ever is Super Bowl 1 and the last Super Bowl in 2024 is Super Bowl 58) |
difference_pts | Point difference for that game |
# Load packages
library(tidyverse)
# Load the CSV data
tv <- read_csv("data/tv.csv", show_col_types=FALSE)
super_bowls <- read_csv("data/super_bowls.csv", show_col_types=FALSE)
# Histogram of point differences
ggplot(super_bowls, aes(x=difference_pts)) +
geom_histogram(binwidth=2) +
labs(x='Difference', y= 'No.of points')
# Exploring the difference between point data
super_bowls %>%
filter(difference_pts == min(difference_pts) | difference_pts == max(difference_pts))
#merge of super_bowl and tv data
tv_game <- tv%>%
inner_join(super_bowls, by = 'super_bowl')
#plotting the merged dataset using a scatter plot
ggplot(tv_game, aes(difference_pts, avg_us_viewers))+
geom_point()+
geom_smooth(method = 'lm')+
labs(x = 'point difference', y = 'average US viewers')
# point difference result in lost viewers across super bowl games
score_impact <- 'weak'
#There's a downward slope of the marginal line in the plot of point difference and viewes. which generally means the higher point difference the point across super_bowl games, the lower the viewership.
#creating a pivot table
#reshaping viewer column
tv_game_avg_usviewers <- tv_game%>%
select(super_bowl, avg_us_viewers)%>%
mutate( category = "Average US Views")%>%
rename ( value = avg_us_viewers)
#reshaping rating
tv_game_ratings <- tv_game%>%
select(super_bowl, rating_household)%>%
mutate( category = " Household Rating")%>%
rename( value = rating_household)
#reshaping ad cost
tv_game_adcost <- tv_game %>%
select(super_bowl, ad_cost)%>%
mutate(category = "Ad cost")%>%
rename(value = ad_cost)
tv_game_trend <- bind_rows(tv_game_avg_usviewers,
tv_game_ratings,
tv_game_adcost)
#plot the data
ggplot(tv_game_trend, aes(super_bowl, value))+
geom_line()+
facet_wrap( ~ category, scales = 'free', nrow = 3)+
labs( x = "Super Bowls", y = "")
#interpreting findings
first_to_increase <- "ratings"
#interpretation : the first to increase was the rating of the super bowl game but views even views increased as well before the sd cost picked up. the TV might have increased their ad cost when they noticed the hike in ratings and viewership.