Bike Usage

Google Data Analytics Capstone: Bike Share

Scenario

I am assuming the role of a junior data analyst on Cyclistic's marketing team, tasked with analyzing differences in how casual riders and annual members use the company's bike-share service. The goal is to provide insights that will help design a marketing strategy to convert casual riders into annual members. The recommendations presented must be supported by strong data insights and professional visualizations to secure executive approval.

Stakeholders

Cyclistic

A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use the bikes to commute to work each day.

Lily Moreno

The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.

Cyclistic marketing analytics team

A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals—as well as how you, as a junior data analyst, can help Cyclistic achieve them.

Cyclistic executive team

The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Ask
Prepare
Process
Analysis of the dataset 4.1 Riders distribution 4.2 Average ride duration 4.3 Distribution of the total monthly rides 4.4 Distribution of weekly bike usage 4.5 Distribution of hourly bike usage 4.6 Average ride duration per hour 4.7 Most popular trip start stations 4.8 Most popular route 4.9 Bike type used by riders
Share
Act Acknowledgement

Ask

Three questions will guide the future marketing program:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

Moreno has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

Business Task

To analyze Cyclistic historical data to determine how annual members and casual riders are differnt in their use of Cyclistic bikes. Insights from this analysis will assist the team to design a marketing program targeting casual riders to become annual members in order to grow Cyclistic.

Prepare

Using Cyclistic’s historical trip data to analyze and identify trends. I downloaded the 2-19 data from here.

https://divvy-tripdata.s3.amazonaws.com/index.html

Divvy_Trips_2019_Q1.zip
Divvy_Trips_2019_Q2.zip
Divvy_Trips_2019_Q3.zip
Divvy_Trips_2019_Q4.zip

The data has been made available by Motivate International Inc. under this license.) This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Source Data

Data was pulled together by combing 4 csv files into a SQL server. The data was futher combined using SQL and loaded into a csv file (bike_data_all.csv)

WITH cte_bike_data AS (

SELECT [trip_id]
      ,[start_time]
      ,[end_time]
      ,[bikeid]
      ,[tripduration]
      ,[from_station_id]
      ,[from_station_name]
      ,[to_station_id]
      ,[to_station_name]
      ,[usertype]
      ,[gender]
      ,[birthyear]
  FROM [dbo].[Divvy_Trips_2019_Q1]

UNION

SELECT [trip_id]
      ,[start_time]
      ,[end_time]
      ,[bikeid]
      ,[tripduration]
      ,[from_station_id]
      ,[from_station_name]
      ,[to_station_id]
      ,[to_station_name]
      ,[usertype]
      ,[gender]
      ,[birthyear]
  FROM [dbo].[Divvy_Trips_2019_Q2]

UNION

SELECT [trip_id]
      ,[start_time]
      ,[end_time]
      ,[bikeid]
      ,[tripduration]
      ,[from_station_id]
      ,[from_station_name]
      ,[to_station_id]
      ,[to_station_name]
      ,[usertype]
      ,[gender]
      ,[birthyear]
  FROM [dbo].[Divvy_Trips_2019_Q3]

UNION

SELECT [trip_id]
      ,[start_time]
      ,[end_time]
      ,[bikeid]
      ,[tripduration]
      ,[from_station_id]
      ,[from_station_name]
      ,[to_station_id]
      ,[to_station_name]
      ,[usertype]
      ,[gender]
      ,[birthyear]
  FROM [dbo].[Divvy_Trips_2019_Q4]

)

SELECT [trip_id]
      ,[start_time]
      ,[end_time]
      ,[bikeid]
      ,[tripduration]
      ,[from_station_id]
      ,[from_station_name]
      ,[to_station_id]
      ,[to_station_name]
      ,[usertype]
      ,[gender]
      ,[birthyear]
	  ,DATEPART(weekday, start_time) AS day_of_week
		,RIGHT('0' + CAST((DATEDIFF(SECOND, start_time, end_time) / 3600) AS VARCHAR), 2) + ':' +
		RIGHT('0' + CAST((DATEDIFF(SECOND, start_time, end_time) % 3600 / 60) AS VARCHAR), 2) + ':' +
		RIGHT('0' + CAST((DATEDIFF(SECOND, start_time, end_time) % 60) AS VARCHAR), 2) AS time_difference
FROM cte_bike_data

Process

I will be using R to work with the data. I chose R because of its flexibility in data manipulation and visualization and also because I want to be proficient in using R for exploratory data analysis.

The cleaning and manipulation steps will be documented using this R notebook. I will be using the trip data from January 2019 to December 2019.

# Load packages
library(tidyverse)
library(lubridate)
library(janitor)
library(readr)

# Load data
trips <- read_csv("bike_data.csv")

# view the first few rows, an overview, and the last few rows
head(trips)
glimpse(trips)
tail(trips)

# check for duplicates
nrow(distinct(trips)) == nrow(trips)

# remove NAs
trips_clean <- drop_na(trips)
glimpse(trips_clean)

# add additional fields
trips_clean <- trips_clean %>%
	mutate(ride_duration = round(difftime(end_time, start_time, units = "mins"))) %>% 
	mutate(month = month(start_time, label = TRUE)) %>% 
	mutate(weekday = wday(start_time, label = TRUE)) %>% 
	mutate(start_hour = hour(start_time)) %>% 
	mutate(route = str_c(from_station_name, to_station_name, sep = " -to- "))

glimpse(trips_clean)

# explore new fields
head(trips_clean)
summary(trips_clean)

# Checking for zero trip duration
trips_clean %>% 
  filter(ride_duration == 0) 

# Checking for negative trip duration 
trips_clean %>% 
  filter(ride_duration < 0)

# remove zero and negative rides
bike_trips <- trips_clean %>% 
  filter(!ride_duration <= 0)

# review final data set 
head(bike_trips)
summary(bike_trips)

Dataset Analysis

I will now work out calculations and visuals for insights. We want to explain how casul riders are differnt form the member riders.

The following visuals will be created for this analyze.

Visuals

Distribution of Riders
Average Ride Duration
Distribution of Rides Monthly
Distribution of Rides per Week
Distribution of Hourly Usage
Average Ride per Hour
Most Popular Start Stations
Most Popular Bike Routes
Type of Bike by Rider

Distribution of Riders

To beign, I will show a visual of the percentage of each category of rider.

‌
‌
‌

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Google Data Analytics Capstone: Bike Share