Analyzing Top Runner Performance from A to Z with AI using Workspace
In this code along, we'll be analyzing Strava data! More specifically, we'll be analyzing total kilometers conquered running, comparing years, average speeds and discovering personal bests.
There's a Strava activities.csv file available in the workspace, but you can also follow these instructions to get your own Strava data. This will require logging into Strava and requesting a bulk export of your data through the settings page. Once your data is ready (it may take up to a few hours), you will get an email with a link to download a big folder. You do not all of this data! Simply unzip it, find a file called activities.csv, and upload this file into your workspace (overwriting the placeholder file).
# import packages
import plotly.express as px
import pandas as pdImporting and prepping the data 🏋️
With the activities.csv file in place, let's import the CSV file.
pd.read_csv('activities.csv')There's a bunch of data that we don't need here; let's zoom in on what we need.
# Positions of relevant columns
usecols = [0, 1, 2, 3, 6, 16, 20]
# English column names
names = [
"activity_id",
"activity_date",
"activity_name",
"activity_type",
"distance_km",
"moving_time_s",
"elevation_gain"
]
# Reading the raw data with preprocessing
df = pd.read_csv(
"activities.csv",
header=0,
)
df# Look at runs only
new_df = df.iloc[: ,usecols]
new_df.info()# Convert distance_km to a float, and calculate average speed
new_df['Distance'] = new_df['Distance'].str.replace(',','').astype('float')
avg_speed = new_df[['Moving Time']].mean()
print(avg_speed)Analyzing distances
import pandas as pd
import plotly.express as px
# Assuming new_df is already defined and includes 'Activity Date' and 'Distance' columns
new_df['Activity Date'] = pd.to_datetime(new_df['Activity Date'])
new_df['years'] = new_df['Activity Date'].dt.year
total_distance_per_year = new_df.groupby('years').agg({'Distance': 'sum'}).reset_index()
df_sorted = total_distance_per_year.sort_values(by='Distance', ascending=False)
px.bar(df_sorted, x='years', y='Distance', color='years' ).show()# Create a cumulative area plot showing total distance run
new_df['Cum_area_distance'] = new_df['Distance'].cumsum()
px.area(new_df , x='Activity Date' , y='Cum_area_distance').show()# Show total distance per month in the year 2022
new_df['months'] = new_df['Activity Date'].dt.month
total_distance_per_month = new_df[new_df['years'] == 2022].groupby('months')['Distance'].sum()
total_distance_per_month_sorted = total_distance_per_month.sort_values(ascending=False).reset_index()
month_abbr = {
1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr',
5: 'May', 6: 'Jun', 7: 'Jul', 8: 'Aug',
9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'
}
# Apply mapping to the 'months' column
total_distance_per_month_sorted['months'] = total_distance_per_month_sorted['months'].map(month_abbr)
px.bar(total_distance_per_month_sorted , x='months' , y='Distance' , color='months',title='Total Distance Run per Month in 2022').show()Analyzing speed
the result : the average of speed in range 13k in 20k of distance was : 5138.39 s --> 1 hour & 25 min
# Show average speed for activities with a distance between 13k and 20k
avg_speed_with_distance = new_df[(new_df['Distance'] >= 13) & (new_df['Distance'] <= 20)]
avg_speed_with_distance = avg_speed_with_distance.groupby('Activity Type')['Moving Time'].mean().reset_index()
avg_speed_with_distance