Skip to content
Analyzing Strava data with AI
  • AI Chat
  • Code
  • Report
  • Strava is a popular platform for tracking and analyzing athletic activities such as running, cycling, and swimming. It allows users to record their workouts using GPS data from their smartphones or dedicated devices. Strava provides a range of features to help athletes monitor their performance, set goals, and connect with a community of like-minded individuals.

    One of the coolest aspects of Strava is the ability to leverage AI (Artificial Intelligence) to analyze the data collected during workouts. By applying machine learning algorithms, Strava can provide valuable insights into an athlete's performance, training patterns, and even predict future outcomes. This AI-powered analysis can help athletes identify areas for improvement, track progress over time, and make data-driven decisions to optimize their training.

    With AI, Strava can go beyond simple data tracking and provide personalized recommendations based on an individual's goals and historical data. Whether it's suggesting new routes, workout plans, or even predicting potential injuries, the AI capabilities of Strava enhance the overall training experience and help athletes reach their full potential.

    By combining the power of Strava's data collection with AI analysis, athletes can gain a deeper understanding of their performance, uncover hidden patterns, and make informed decisions to improve their training outcomes. It's truly fascinating how technology can revolutionize the way we analyze and optimize our athletic endeavors.

    This dataset provides information on various sports activities. It consists of 510 entries and 87 columns, each containing different parameters and attributes of the activities. Here are descriptions for some key columns:

    • Activity ID: Unique identifier for each activity.
    • Activity Date: Date of the activity.
    • Activity Name: Name or description of the activity.
    • Activity Type: Type of the activity (e.g., running, cycling).
    • Elapsed Time: Total duration of the activity.
    • Distance: Distance covered during the activity.
    • Max Heart Rate: Maximum heart rate during the activity.
    • Relative Effort: Relative measure of effort during the activity.
    • Commute: Binary indicator for whether the activity is a commute.
    • Average Speed: Average speed during the activity.
    • Elevation Gain: Total elevation gained during the activity.
    • Max Grade: Maximum gradient or incline during the activity.
    • Average Heart Rate: Average heart rate during the activity.

    The dataset also includes various other parameters such as temperature, humidity, wind speed, and more, which could be related to the conditions during the activities. Columns containing "translation_missing" may have missing or outdated information. Additionally, some columns like "Media" may contain links to images or videos associated with the activities.

    Analyzing Top Runner Performance from A to Z with AI using Workspace

    In this code along, we'll be analyzing Strava data! More specifically, we'll be analyzing total kilometers conquered running, comparing years, average speeds and discovering personal bests.

    There's a Strava activities.csv file available in the workspace, but you can also follow these instructions to get your own Strava data. This will require logging into Strava and requesting a bulk export of your data through the settings page. Once your data is ready (it may take up to a few hours), you will get an email with a link to download a big folder. You do not all of this data! Simply unzip it, find a file called activities.csv, and upload this file into your workspace (overwriting the placeholder file).

    # import packages
    import plotly.express as px
    import pandas as pd

    Importing and prepping the data 🏋️

    With the activities.csv file in place, let's import the CSV file.

    df = pd.read_csv('activities.csv')
    df.info() 

    There's a bunch of data that we don't need here; let's zoom in on what we need.

    # Positions of relevant columns
    usecols = [0, 1, 2, 3, 6, 16, 20]
    
    # English column names
    names = [
        "activity_id",
        "activity_date",
        "activity_name",
        "activity_type",
        "distance_km",
        "moving_time_s",
        "elevation_gain"
    ]
    
    # Reading the raw data with preprocessing
    df = pd.read_csv(
        "activities.csv",
        parse_dates = [1],
        header=0,
        usecols = usecols,
        names = names, 
    )
    
    df
    # Filter the dataframe to only keep activities with the type "Run"
    runs = df[df['activity_type'] == 'Run']
    runs
    # Convert distance_km to a float, and calculate average speed
    runs_clean = runs.copy()
    runs_clean['distance_km'] = runs_clean['distance_km'].astype(float)
    runs_clean['average_speed_kmh'] = runs_clean['distance_km'] / (runs_clean['moving_time_s'] / 3600)
    runs_clean
    runs_clean.dtypes

    Analyzing distances

    import plotly.express as px
    
    # Group the runs_clean dataframe by year and calculate the sum of distance_km for each year
    distance_per_year = runs_clean.groupby(runs_clean['activity_date'].dt.year)['distance_km'].sum().reset_index()
    
    # Create a bar chart using plotly express
    fig = px.bar(distance_per_year, x='activity_date', y='distance_km', labels={'activity_date': 'Year', 'distance_km': 'Total Distance (km)'})
    
    # Set the color for the bars of the year 2023 to a lighter shade
    default_color = 'rgba(0,0,0,1)'
    last_year_color = 'rgba(0,0,0,0.2)'
    colors = [default_color, default_color, default_color, default_color, last_year_color]
    
    fig.update_traces(marker_color=colors)
    
    fig.show()
    import plotly.express as px
    
    # Calculate the cumulative sum of distance_km 
    runs_clean['cumulative_distance'] = runs_clean['distance_km'].cumsum()
    
    # Create a cumulative area plot using plotly express
    fig = px.area(runs_clean, x='activity_date', y='cumulative_distance', labels={'activity_date': 'Date', 'cumulative_distance': 'Cumulative Distance (km)'})
    
    fig.show()
    import plotly.express as px
    
    # Filter the runs_clean dataframe for the year 2022
    runs_2022 = runs_clean[runs_clean['activity_date'].dt.year == 2022]
    
    # Group the runs_2022 dataframe by month and calculate the sum of distance_km for each month
    distance_per_month_2022 = runs_2022.groupby(runs_2022['activity_date'].dt.month)['distance_km'].sum().reset_index()
    
    # Create a bar chart using plotly express
    fig = px.bar(distance_per_month_2022, x='activity_date', y='distance_km', labels={'activity_date': 'Month', 'distance_km': 'Total Distance (km)'})
    
    fig.show()

    Analyzing speed