Skip to content

Olympics Data Exploration

This is a historical dataset on the modern Olympic Games, from Athens 1896 to Rio 2016. Each row consists of an individual athlete competing in an Olympic event and which medal was won (if any).

With this dataset, I will be using a list of 20 queries to explore the data and practice SQL skills.

The queries are as followed:

  1. How many olympics games have been held?
  2. List down all Olympics games held so far.
  3. Mention the total no of nations who participated in each olympics game?
  4. Which year saw the highest and lowest no of countries participating in olympics?
  5. Which nation has participated in all of the olympic games?
  6. Identify the sport which was played in all summer olympics.
  7. Which Sports were just played only once in the olympics?
  8. Fetch the total no of sports played in each olympic games.
  9. Fetch details of the oldest athletes to win a gold medal.
  10. Find the Ratio of male and female athletes participated in all olympic games.
  11. Fetch the top 5 athletes who have won the most gold medals.
  12. Fetch the top 5 athletes who have won the most medals (gold/silver/bronze).
  13. Fetch the top 5 most successful countries in olympics. Success is defined by no of medals won.
  14. List down total gold, silver and bronze medals won by each country.
  15. List down total gold, silver and bronze medals won by each country corresponding to each olympic games.
  16. Identify which country won the most gold, most silver and most bronze medals in each olympic games.
  17. Identify which country won the most gold, most silver, most bronze medals and the most medals in each olympic games.
  18. Which countries have never won gold medal but have won silver/bronze medals?
  19. In which Sport/event, USA has won the most medals.
  20. Break down all olympic games where USA won a medal for Hockey and how many medals in each olympic games.

Data Dictionary

ColumnExplanation
idUnique number for each athlete
nameAthlete's name
sexM or F
ageAge of the athlete
heightIn centimeters
weightIn kilograms
teamTeam name
nocNational Olympic Committee 3
gamesYear and season
yearInteger
seasonSummer or Winter
cityHost city
sportSport
eventEvent
medalGold, Silver, Bronze, or NA
Spinner
DataFrameas
df6
variable
SELECT *
FROM athlete_events.csv.gz;

Source and license of the dataset. The dataset is a consolidated version of data from www.sports-reference.com.

1. How many olympics games have been held?

For this question, both COUNT and DISTINCT will be used to augment the games column. DISTINCT will only select unique values and COUNT will return a number count of values in that selection.

Spinner
DataFrameas
df
variable
SELECT COUNT(DISTINCT games)
FROM athlete_events.csv.gz;

2. List down all Olympics games held so far.

This dataset has data from 51 different Olympic Games. In order to see each different game, remove the COUNT function and add an ORDER BY function to see the games ordered by year.

After running this, it is shown that the first recorded games in this dataset is 1896 Summer, and the last is 2016 Summer.

Spinner
DataFrameas
df1
variable
SELECT DISTINCT games
FROM athlete_events.csv.gz
ORDER BY games ASC;

3. Mention the total no of nations who participated in each olympics game?

Once again, COUNT(DISTINCT ---) will be used in order to get the total number of countries. By selecting the games column and a distinct count of the noc column, it will display how many countries participated in each Olympic Games.

This data was ordered by number of countries to easily see which games had the most and least participants.

Spinner
DataFrameas
df2
variable
SELECT games, 
	COUNT(DISTINCT noc) AS num_countries
FROM athlete_events.csv.gz
GROUP BY games
ORDER BY num_countries DESC;

4. Which year saw the highest and lowest no of countries participating in olympics?

For this question, the previous query will be reused but instead sorting the data by games.

When putting this data into a bar chart, it's easy to see the difference in participation between Summer and Winter Olympics.

Since the start of the Winter Olympics data in 1924, its participation has not exceded that of the Summer Olympics.

Spinner
DataFrameas
df3
variable
SELECT games, 
	COUNT(DISTINCT noc) AS num_countries
FROM athlete_events.csv.gz
GROUP BY games
ORDER BY games DESC;

5. Which nation has participated in all of the olympic games?N

For this question, knowledge from question 1 that there are 51 Olympic Games in this dataset will be used.

To start, SELECT the noc column and a DISTINCT COUNT of the games column to display.

The data should be grouped by noc, to know how many games each country has competed in.

Lastly, to display only countries, we will use the HAVING statement to only display data where the COUNT(DISTINCT games) = 51

Spinner
DataFrameas
df4
variable
SELECT noc AS country, 
	COUNT(DISTINCT games) AS num_games
FROM athlete_events.csv.gz
GROUP BY noc
HAVING COUNT(DISTINCT games) = 51;