Skip to content

Whether or not you like football, the Super Bowl is a spectacle. There's a little something for everyone at your Super Bowl party. Drama in the form of blowouts, comebacks, and controversy for the sports fan. There are the ridiculously expensive ads, some hilarious, others gut-wrenching, thought-provoking, and weird. The half-time shows with the biggest musicians in the world, sometimes riding giant mechanical tigers or leaping from the roof of the stadium.

The dataset we'll use was scraped and polished from Wikipedia. It is made up of three CSV files, one with game data, one with TV data, and one with halftime musician data for 52 Super Bowls through 2018.

The Data

Three datasets have been provided, and summaries and previews of each are presented below.

1. halftime_musicians.csv

This dataset contains information about the musicians who performed during the halftime shows of various Super Bowl games. The structure is shown below, and it applies to all remaining files.

ColumnDescription
'super_bowl'The Super Bowl number (e.g., 52 for Super Bowl LII).
'musician'The name of the musician or musical group that performed during the halftime show.
'num_songs'The number of songs performed by the musician or group during the halftime show.

2. super_bowls.csv

This dataset provides details about each Super Bowl game, including the date, location, participating teams, and scores, including the points difference between the winning and losing team ('difference_pts').

3. tv.csv

This dataset contains television viewership statistics and advertisement costs related to each Super Bowl.

# Import libraries
import pandas as pd
from matplotlib import pyplot as plt
tv = pd.read_csv("datasets/tv.csv")
tv.head()

Has TV viewership increased over time?

To determine if TV viewership has increased over time, we can follow these steps:

  1. Plot the values of avg_us_viewers against super_bowl as a line graph to visualize any trends.
  2. Identify the highest and lowest values of avg_us_viewers and their corresponding Super Bowl years.
  3. Compare the years of the highest and lowest viewership. If the year with the highest viewership is more recent than the year with the lowest viewership, it indicates a positive increase in viewership over time.
import pandas as pd
import matplotlib.pyplot as plt

tv = pd.read_csv("datasets/tv.csv")
tv.head()

plt.plot(tv['super_bowl'], tv['avg_us_viewers'])
plt.show()

views_max = tv['avg_us_viewers'].max()
views_min = tv['avg_us_viewers'].min()

year_max = int(tv.loc[tv['avg_us_viewers'] == views_max, 'super_bowl'].values)
year_min = int(tv.loc[tv['avg_us_viewers'] == views_min, 'super_bowl'].values)
viewership_increased = year_max > year_min
print(str(viewership_increased) + ', viewership has increased.')
# Load the CSV data into DataFrames
super_bowls = pd.read_csv("datasets/super_bowls.csv")
super_bowls.head()

How many matches finished with a point difference greater than 40?

To determine the number of matches that finished with a point difference greater than 40, follow these steps:

  1. Filter the rows where the 'difference_pts' column has values greater than 40.
  2. Count the number of rows that meet this condition.
  3. Print the result.
import pandas as pd

super_bowls = pd.read_csv("datasets/super_bowls.csv")
super_bowls.head()

diff = super_bowls[super_bowls['difference_pts'] > 40]

difference = int(diff['difference_pts'].count())
print(str(difference) +' match finished with a point difference greater than 40.')
halftime_musicians = pd.read_csv("datasets/halftime_musicians.csv")
halftime_musicians.head()

Who performed the most songs in Super Bowl halftime shows?

To determine which musician performed the most songs during Super Bowl halftime shows, follow these steps:

  1. Identify the row with the maximum number of songs performed.
  2. Extract the musician's name associated with this maximum value.
  3. Print the result.
import pandas as pd
import matplotlib.pyplot as plt

halftime_musicians = pd.read_csv("datasets/halftime_musicians.csv")
halftime_musicians.head()

songs = halftime_musicians['num_songs'].max()
singer = halftime_musicians.loc[halftime_musicians['num_songs'] == songs, 'musician'].values
most_songs = singer[0]
print(most_songs + ' performed the most songs in Super Bowl halftime shows.')