How Likely is an Athlete to Win a Medal in Judo?
Welcome to your workspace! In this walkthrough, you will learn the basics of Workspace as you begin to analyze an Olympics dataset and prepare a shareable Workspace report!
Your goal will be to use the data available to you to help estimate how likely an athlete is to win a medal in Judo.
Keep an eye out for 💪 icons throughout the notebook. These will indicate opportunities for you to try out Workspace for yourself!
🏆 Load in the Olympics Data
If you click on the file browser icon, you can see that you have access to olympics_data.csv
, a file with different information on Olympics events. The cell below uses pandas
to import the data and preview it.
Go ahead and try to run the cell now to import and inspect the data!
To run a cell, click inside it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically switch to the next cell.
# Import libraries
import pandas as pd
import numpy as np
# # Import the data
olympics_data = pd.read_csv("olympics_data.csv")
# Preview the DataFrame
olympics_data
💪 Browse through the interactive table to see if you can already learn anything from the data!
🌐 Bring in additional data
Depending on your analysis, you may find using additional world data helpful. In Workspace, you can add and run SQL cells to access relational databases. The query below retrieves data from a database containing information on nations of the world. It stores the query results as a pandas
DataFrame with a name of your choosing.
You can click the "Browse tables" button to view the available tables and columns or continue with the existing query.
-- Select country data based on most recent year
WITH world_countries AS (
SELECT
countries.name AS country,
continents.name AS continent,
year,
population,
gdp,
ROW_NUMBER() OVER(PARTITION BY country ORDER BY year DESC) AS year_index
FROM countries
INNER JOIN country_stats USING(country_id)
INNER JOIN regions USING(region_id)
INNER JOIN continents USING(continent_id)
GROUP BY 1, 2, 3, 4, 5)
SELECT
country,
continent,
year,
population,
gdp
FROM world_countries
WHERE year_index = 1
🔬 Zooming in on judo
We now have country data that we can combine with the Olympics data. We will use the .merge()
method to combine the two DataFrames using the team
and country
columns.
A "left" join matches on rows in the olympics_data
DataFrame, as some teams will not be present in the countries_data
DataFrame. We then filter for rows where the sport is "Judo".
# Perform a left join between the two DataFrames using the country columns
olympics_country_data = olympics_data.merge(
countries_data, left_on="team", right_on="country", how="left", suffixes=["_games", "_country_data"]
)
# Filter for Judo
judo_data = olympics_country_data[olympics_country_data["sport"] == "Judo"]
# Preview the data
judo_data
# Drop athletes who didn't win a medal
judo_medals = judo_data.dropna(subset="medal")
# Get the counts of medals by team
country_counts = judo_medals.groupby("team", as_index=False)["medal"].count()
# Select only the top 10 countries
best_countries = country_counts.sort_values(by="medal", ascending=False).head(10)
# Inspect the data
best_countries
As we might expect, Japan has the highest medal count for judo! But this would look even better as a plot. Fortunately, Workspace has a handy chart cell that allows you to quickly generate and customize different chart types.
Let's use a bar chart showing each country's total number of medals won. Select the cell below and click "Refresh" to generate the chart!
💪 Be sure to try out other data visualizations by adjusting the chart type, the x-axis, y-axis, and grouping options!
Top judo countries by medal count
🔬 Go forth and analyze!
Well done! It's now up to you to further explore the data you imported, create new features, and estimate the probability that a judo athlete will earn a medal. Try to use the additional SQL data you queried!
When you're finished, make sure to publish your work which can be shared with peers and featured on your DataCamp profile. After you have finished preparing your report, consider the following options:
- Try out our ready-to-use datasets. These cover various topics and include flat files such as csvs and additional databases for you to test your SQL skills!
- Kickstart your next project by using one of our templates. These provide the code and instructions on various data science topics, ranging from machine learning to visualization.
- Want to go at it on your own? Open a blank workspace and get coding!