Skip to content

Source: https://www.lyft.com/bikes/bay-wheels/explore

How to convert casual riders into member users? The case study

Introduction

Challenge

The marketing director of Bay Wheels, a bike-sharing system in San Francisco Bay Area operated by Lyft, Inc., believes that annual subscriptions are fundamental to the company's future growth and success. The goal is to increase the conversion of casual riders into member users. To achieve this, they aim to make informed, data-driven decisions regarding marketing and pricing strategies.

Method

This analysis was conducted to support marketing and pricing decisions aimed at increasing annual membership adoption, using one year of Bay Wheels trip data.

The behavioral patterns of both rider types were investigated, which helped answer fundamental business questions.
For instance:

  1. How often do casual and member users ride compared to each other?
  2. When do casuals and members prefer to use the service?
  3. Who usually takes longer rides, casuals or members?
  4. How does the number of rides change throughout a week, month, year?
  5. What type of bike is more popular, classic or electric?

Outcome

The key takeaways and recommendations were offered to the marketing team, including metrics to validate the implementation process. The test suggestions for each recommendation were also proposed to check the member conversion rate.

Reference

In this project, I used PostgreSQL for analysis, Markdown, and MS Excel for visualization. The Datacamp Workspace is used for building and formatting.

This data analysis and visualizations are based on publicly available one-year period data of Bay Wheels trip data in the year 2022, under this license. The dataset can be downloaded here in CSV format for each month.


3 hidden cells

Data Preparation

Before I started coding, I needed to prepare the data in the best possible method to contribute to the best possible decisions. On the BayWheels website, I downloaded publicly available bike trips data for each month in 2022.

The biggest challenge was to append 12 CSV downloaded files into one huge database. Luckily, there was a solution in the DataCamp workspace by writing SQL code. First, I created a table with all the columns and field types the same as in each csv file. Then, I inserted data from each CSV file into the new table. In the end, I exported the appended data to a new big csv file. The SQL code below illustrates these steps:

-- Create a table to store the appended data DROP TABLE IF EXISTS baywheels_appended_tripdata_2022; CREATE TABLE baywheels_appended_tripdata_2022 ( ride_id varchar, rideable_type varchar, started_at text, ended_at text, start_station_name varchar, start_station_id varchar, end_station_name varchar, end_station_id varchar, start_lat double precision, start_lng double precision, end_lat double precision, end_lng double precision, member_casual varchar ); -- Insert data from each CSV file into the table COPY baywheels_appended_tripdata_2022 FROM '202201-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202202-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202203-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202204-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202205-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202206-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202207-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202208-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202209-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202210-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202211-baywheeels-tripdata.csv' DELIMITER ',' CSV HEADER; COPY baywheels_appended_tripdata_2022 FROM '202212-baywheels-tripdata.csv' DELIMITER ',' CSV HEADER; -- Repeat the above line for all 12 files -- Export the data to a new CSV file COPY baywheels_appended_tripdata_2022 TO 'baywheels_tripdata_2022.csv' DELIMITER ',' CSV HEADER;

The combined table contains 2,614,800 rows and 13 columns, which is a huge amount of data.


1 hidden cell

Data Exploration and Cleaning - Business Decision Protection

To ensure the reliability and accuracy of my analysis, I performed a comprehensive series of data cleaning and validation steps on the 2022 Bay Wheels bike ride dataset.

Each row of data represents a single ride with date and time, location, type of bike, and user type. Here is how I validated the data before the analysis, column by column:

  • ride_id: The unique identifier of each ride that contains 16 characters. There are no duplicates or null values within over 2.6 million records.

  • rideable_type: There are three unique values - classic_bike, electric_bike, and docked_bike. The rides using the docked_bike didn't occur after March 24th. It's very likely that "docked_bike" is the old name for "classic_bike". As a result, all "docked_bike" values have been changed to "classic_bike", which affected 9250 rows. There were no null values.

    Rationale: Since "docked_bike" and "classic_bike" refer to the same type of vehicle, considering them as a single category wouldn't affect key metrics interpretation as the foundation for decisions.

  • started_at / ended_at: Timestamp fields that show the start and end of each ride, respectively. For further analysis and visualization, I created two new columns: ride_duration (computed as ended_at - started_at) and day_of_week. Next, I noticed that in some records, the "ended_at" value was less than the "started_at" value (negative ride duration), as well as very short or long trips. Therefore, I considered all rides that have less than 1 minute and over 24 hours as outliers. 74057 rows were deleted.

    Rationale: These thresholds were chosen to remove system errors and abnormal behavior that would distort average ride duration metrics used for pricing and product decisions.

  • start_station_name / start_station_id: According to the rules of Lyft, the bike-sharing system provider, all classic bike trips must start and end at a docking station. On the other hand, electric bikes have a lock option, which makes it possible to start or finish a ride outside a docking station. Considering this, I found no null values in the case of a ride taken on a "classic_bike".

  • end_station_name / end_station_id: Similar to the previous columns, I validated all the rows with classic bikes. 439 rows with null values have been removed.

  • start_lat / start_lng: Represent geographical coordinates in decimal degrees of the starting location. Because each ride always has coordinates, both start and finish, there couldn't be any null values. Luckily, I didn't find any.

  • end_lat / end_lng: Here, no null values were found as well, but there were 2 records with 0-value coordinates. The geographic location of the research area makes it impossible to obtain such values; I deleted these rows.

  • member_casual: "Casual" and "member" are the only two allowed values. No null or other values were detected.

AI-assisted exploration was used to validate SQL logic and proofhead the written code below, which saved 1 to 2 hours of work. All outputs were manually reviewed and validated.

-- Create a table to hold the data DROP TABLE IF EXISTS baywheels_tripdata_2022; CREATE TABLE baywheels_tripdata_2022 ( ride_id varchar, rideable_type varchar, started_at timestamp, ended_at timestamp, start_station_name varchar, start_station_id varchar, end_station_name varchar, end_station_id varchar, start_lat double precision, start_lng double precision, end_lat double precision, end_lng double precision, member_casual varchar ); -- Import data from CSV into the table (replace 'your_file_path' with the actual path) COPY baywheels_tripdata_2022 FROM './baywheels_tripdata_2022.csv' DELIMITER ',' CSV HEADER; -- Replace docked_bike values with classic_bike UPDATE baywheels_tripdata_2022 SET rideable_type = 'classic_bike' WHERE rideable_type = 'docked_bike'; -- Add two new columns to our table ALTER TABLE baywheels_tripdata_2022 ADD COLUMN ride_duration interval; ALTER TABLE baywheels_tripdata_2022 ADD COLUMN day_of_week varchar; -- Compute data about trip duration and day of the week UPDATE baywheels_tripdata_2022 SET ride_duration = ended_at-started_at, day_of_week = DAYNAME(started_at)::varchar; -- Remove outlying values based on trip duration DELETE FROM baywheels_tripdata_2022 WHERE ride_duration < INTERVAL '1 minute' OR ride_duration > INTERVAL '24 hours'; --Remove null values in end_station_name for classic bike trips DELETE FROM baywheels_tripdata_2022 WHERE end_station_name IS NULL AND rideable_type = 'classic_bike'; -- Remove zero values of end coordinates DELETE FROM baywheels_tripdata_2022 WHERE end_lat=0; -- Query the modified data SELECT COUNT(*) FROM baywheels_tripdata_2022;

In the end, I obtained 2,540,302 public bike rides that occurred throughout the year 2022 in the San Francisco Bay Area for analysis.


1 hidden cell

Analysis and Visualization

Distribution of rides by city

I started off by analyzing the distribution of all rides in major Bay Area cities. The fastest way to figure it out was by using the geographic coordinates and CASE WHEN structure.

The vast majority of records, 2,309,049 or 90.9%, belong to San Francisco. Only a fraction of rides started either in San Jose (2.69%) or in the East Bay (Oakland and Berkeley) (6.41%).

Business implication: Any pricing or marketing changes should be tested in San Jose or East Bay first, before scaling into the biggest market - City of San Francisco.

Member vs casual user behavior: general statistics

As a next step, I analyzed in general how member users and casual users behave differently. Users with a membership rented a bike 1,460,433 times, which is about 57% of all rides.

Then I compared the mean and the median of the ride duration for both types of users. The mean value (the average) turned out to be greater than the median (a middle value of all the values sorted from biggest to smallest) for both casual and member user types. Member's average ride time is about 3 minutes than its median, whereas for a casual user the difference is bigger - over 6.5 minutes.

The reason might be that there is a handful of very long rides (up to 24 hours) which bring the average up, especially in the case of casual rides.

Business implication: Casual users are more likely to take unusually long rides, so implementing time-based pricing or ride duration limits could help manage outlier usage and optimize fleet availability.

User behavior during the day

The user activity varies significantly throughout a day. The peak activity of any user type usually occurs twice during the day: in the morning (8-9 am) and in the evening (5-6 pm). Especially in the case of members, who use bikes mostly to commute to work or school and return home. The lowest activity happens between midnight and 5 am, regardless of the user type.

)

Ride frequency during rush hours

Let's take a closer look at rush hours. In this binary analysis, I considered all rides taken between 8 and 9 in the morning and between 5 and 6 in the evening as rush hour. Saturdays and Sundays were excluded entirely, as well as federal holidays in 2022.

Indeed, users with a membership tend to ride more during rush hours compared to casual riders. However, about 3 out of 4 member rides occur outside the busy hours.

Business implication: The members' rush-hour commute habits suggest that pricing or benefit changes would disproportionally affect members compared to casual users, and should be tested carefully.

User behavior during the week

The analysis of the distribution of rides during the week among casual and member riders provided helpful insights to understand users' behavior.

The average ride duration of casual users fluctuates from the minimum of around 16.5 minutes in mid-week to the maximum of 22.2 minutes on Sundays. Member riders remain more consistent in the amount of time for a ride. The value keeps steady from Monday to Friday and slightly increases on the weekend.

Casual users in general take longer rides and use bikes mostly for recreational purposes. This hypothesis makes even more sense when we look at the bar chart below:

Saturdays and Sundays are the only days when casual rides outnumber member ones. In fact, about every third trip taken by casual users occurred on the weekend. For members, the rate is only 22.9%.

Business implication: To attract more casual riders, marketing campaigns and special promotions should be focused on weekends, when casual usage peaks, while member engagement strategies can target consistent weekday usage.

User behavior throughout the year

Casual and member riders' preferences on a monthly basis

The similar code helped me to figure out the distribution of rides throughout the year:

The total number of rides gradually increases from January through spring and summer and reaches its highest point in September and October when the warm weather and clear skies are dominant. Then it falls drastically in November and December when it gets colder along with rainy days.

Casual and member users have a similar number of rides most of the year, only in November-December members outnumber casual users significantly. In every single month, there were more member rides than casual rides.

The average ride duration remains steady in different months for both casual and member users. Only in November-December period, people tend to ride shorter because of colder weather and longer nights.

Casual and member riders' preferences on a seasonal basis

In general, 25.1% of all rides took place in spring, 29.2% took place in summertime, 28.1% happened during fall, and only 17.6% occurred in winter.

Although for both member and casual user types, winter was the least active season, the high season was different. Casual users used the service mostly in summer, but for members, fall was the most popular season. That's the time when vacations are over and the weather is still warm and pleasant.

Business implication: The best time to implement promo campaigns is during the spring and summer months — especially from May to September — when overall ridership peaks and casual users are most active.

Bike type preferences among member & casual riders

General statistics

In the San Francisco Bay Area in 2022, electric bikes were almost twice as popular as classic bikes among all users. In fact, casual users chose an electric bike 68.1% of the time, which is a higher rate than members did (61.8% of rides via an electric bike).

Bike type choice during the day

During each hour within a 24-hour span, there were roughly twice as many rides via an electric bike than by a classic one, regardless of the user type. The similar ratio was also observed during rush hours.

Bike type preferences on weekdays & weekends/holidays

The data shows that members took 75.5% of all rides on weekdays and 24.5% on days off. On the other hand, casual users tend to spend more time biking on weekends and federal holidays than members do, with percentages of 64.4% and 35.6% respectively.

Bike type usage throughout the year

The way different bike types were used stays similar throughout the entire year, with small differences. In February-March, there were about 1.7 electric bike rides per 1 classic bike ride on average, but in July the ratio was 1.95:1, respectively.

September had the greatest number of rides for both classic and electric bikes. In contrast, December was the least popular month for all bike types.

Business implication: To further increase the share of electric bike rides (and thus revenue), consider targeted promotions or loyalty rewards for choosing e-bikes, especially during periods or among user groups with lower e-bike adoption.

Key Findings

  • The bike service in San Francisco accounted 90.9% of rides.
  • Member users had a median ride duration of 9 minutes 3 seconds, casual riders had a median ride duration of 12 minutes 6 seconds.
  • Members primarily used the service for commuting, while casual riders used it for recreational purposes.
  • Ride distribution during the day followed general human behavior and working habits, while seasonal variations were influenced by climate patterns.
  • Electric bikes were preferred over classic bikes regardless of the time of day, season, or user type.

Recommendations

  • Test a small rush-hour surcharge for casual riders in selected stations
    • Metric: Measure conversion rate of casual riders to membership and track ride abandonment rates.
    • Risk/Assumption: Assumes price sensitivity is not so high that it deters casual use entirely; risk of reduced ridership.
    • How to test: Run a 4-week A/B test at selected stations, comparing conversion and abandonment rates between surcharge and control groups.
  • Increase conversion of casual riders to annual memberships by offering a year-round discount for annual subscriptions
    • Metric: Track the percentage of casual riders converting to annual memberships,
    • Risk assumption: price is a primary barrier to conversion;
    • How to test: run a 3-month pilot with the discount and compare conversion rates to a control group.
  • Provide a special discount for annual subscriptions to riders who take trips over 1 hour during weekends
    • Metric: Conversion rate of long-duration weekend riders to annual memberships.
    • Risk/Assumption: Assumes long-duration riders are more likely to convert; risk that discounts may jeopardize full-price sales.
    • How to test: Offer the discount to qualifying riders for 2 months and compare conversion rates and ride frequency to a matched control group.
  • Increase bike inventory in San Jose and East Bay to improve availability in these areas
    • Metric: Bike availability rate (percentage of time bikes are available at stations), ride volume, and user satisfaction in these areas.
    • Risk/Assumption: Assumes unmet demand exists and increased supply will be utilized; risk of underutilized inventory.
    • How to test: Gradually increase inventory in targeted areas and monitor changes in availability, ridership, and user feedback over a 3-month period compared to baseline.

Next steps and extensions

This project can be enriched or extended by:

  • Cohort analysis (transition rates from casual to member by user demographics and income)
  • Weather data intergation (the climate impact on annual behavioral trends)
  • Revenue simulation (casuals vs members)
  • Station-level optimization (bike availability higher vs lower than demand at specific stations)