Cyclistic Case Study (ENG)

Google Data Analytics Course - Case Study 1 - Cyclistic
November 2021 - October 2022

How Does a Bike-Sharing Service Design a Marketing Strategy to Convert Occasional Cyclists into Annual Subscribers?

Scenario

You are a data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company's future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

Characters and teams:

Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic's mission and business goals --- as well as how you, as a junior data analyst, can help Cyclistic achieve them.
Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Three questions will guide the future marketing program:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

[1-6] Ask

Guiding questions:

What is the problem you are trying to solve?
Create a profile of the two types of customers so that you can identify all their most critical behavioural characteristics..
How can your insights drive business decisions?
My work can help the marketing team develop a strategy to convert as many occasional cyclists as possible into subscribers.

Key tasks:

Identify the business task
Consider key stakeholders

Deliverable:

A clear statement of the business task

[2-6] Prepare

Historical data on customer rides over the past 12 months collected directly by Cyclistic are used to analyse and identify behaviour.

Guiding questions:

Where is your data located?
The data are found grouped on a page accessible via a public link. The data sets have a different name because Cyclistic is a fictional company.
How is the data organized?
The data are available in individual .csv files broken down by month.
Are there issues with bias or credibility in this data? Does your data ROCCC?
I have not identified any problems of bias or reliability at the preparation stage, as data was collected directly from the company, and the population is the entire customer base. My data are Reliable, Original, Comprehensive, Current and Cited (ROCCC).
How are you addressing licensing, privacy, security, and accessibility?
Regarding privacy, they do not include sensitive data (e.g. credit cards, telephone numbers, etc.), making it impossible to trace the identity of the individual rider.
This licence document has made the data available by Motivate International Inc for non-commercial purposes. We can use public data to explore how different types of customers use Cyclistic's bicycles. However, data privacy issues prohibit the use of cyclists' personal information. It means that it will not be possible to link pass purchases to credit card numbers to determine whether occasional cyclists live in Cyclistic's service area or have purchased multiple individual passes.
For this case study, the data sets are appropriate and allow the assigned questions to be answered.
How did you verify the data's integrity?
Each dataset has easily identifiable labelled columns and the data is correctly populated according to the specific type.
How does it help you answer your question?
The procedure followed during the preparation phase will allow you to answer the central question posed by the client, i.e. to give a precise idea of the behavioural pattern of the cyclist using Cyclistic's services.
Are there any problems with the data?
Cells with empty or null values have been identified.

Key tasks:

Download data and store it appropriately.
Identify how it's organized.
Sort and filter the data.
Determine the credibility of the data.

Deliverable:

A description of all data sources used.

[3-6] Process

Data from the last 12 months will be loaded, and new columns will be labelled with easy-to-understand terminologies, such as 'ride_length' and 'day_of_the_week'.

For datasets, the prefix 'ds_' will be used;
'Members' will refer to annual subscribers;
'Casual' will refer to occasional users who rent from time to time;
It is assumed that 'occasional' bicycle rental can be done on the company's website, via a mobile application or directly at the stations.

Guiding questions:

What tools are you choosing and why?
To sort and organise the data, I chose to use the R language with RStudio, as I found it suitable for carrying out all the tasks required by the case study, as well as being able to perform all the operations in a centralised manner so that they could also be easily reworked in the event of changes/integrations.
Have you ensured your data's integrity?
Yes, the data is consistent in all columns.
What steps have you taken to ensure that your data is clean?
First, the columns were formatted with the correct data type and then Na-values and duplicates were removed.
How can you verify that your data is clean and ready to analyze?
It can be verified using this R markdown file.
Have you documented your cleaning process so you can review and share those results?
I confirm that everything has been documented in detail in this R markdown file.

Key tasks:

Check the data for errors;
Choose your tools;
Transform the data so you can work with it effectively;
Document the cleaning process.

Deliverable:

Documentation of any cleaning or manipulation of data

Libraries Setup

Before loading the libraries, all relevant packages must have been installed previously.
If not, run the first code chunk below; otherwise, go directly to loading the libraries.

install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggplot2")
install.packages("janitor")
install.packages("dplyr")
install.packages("skimr")
install.packages("scales")

library(tidyverse) #helps wrangle data
library(lubridate) #helps wrangle data attributes
library(ggplot2) #helps visualize data
library(janitor) # simply tools for examining and cleaning dirty data
library(dplyr) # data manipulations
library(skimr) # compact and flexible summaries of data
library(scales) # scale functions for visualization

getwd() #your working directory

Step 1-5: Data Collection

Load data sets into R:

ds_2021_011 <- read_csv("202111-divvy-tripdata.csv")
ds_2021_012 <- read_csv("202112-divvy-tripdata.csv")
ds_2022_001 <- read_csv("202201-divvy-tripdata.csv")
ds_2022_002 <- read_csv("202202-divvy-tripdata.csv")
ds_2022_003 <- read_csv("202203-divvy-tripdata.csv")
ds_2022_004 <- read_csv("202204-divvy-tripdata.csv")
ds_2022_005 <- read_csv("202205-divvy-tripdata.csv")
ds_2022_006 <- read_csv("202206-divvy-tripdata.csv")
ds_2022_007 <- read_csv("202207-divvy-tripdata.csv")
ds_2022_008 <- read_csv("202208-divvy-tripdata.csv")
ds_2022_009 <- read_csv("202209-divvy-publictripdata.csv")
ds_2022_010 <- read_csv("202210-divvy-tripdata.csv")

Step 2-5: Process the data and combine them into a single file

**Check the fields between the various data sets and combine them.
**Use the column names of the most recently loaded data set as a reference.

colnames(ds_2021_011)
colnames(ds_2021_012)
colnames(ds_2022_001)
colnames(ds_2022_002)
colnames(ds_2022_003)
colnames(ds_2022_004)
colnames(ds_2022_005)
colnames(ds_2022_006)
colnames(ds_2022_007)
colnames(ds_2022_008)
colnames(ds_2022_009)
colnames(ds_2022_010)

Ensure that the columns are of the same type:

compare_df_cols(ds_2021_011,ds_2021_012,ds_2022_001,ds_2022_002,ds_2022_003,ds_2022_004,ds_2022_005,ds_2022_006,ds_2022_007,ds_2022_008,ds_2022_009,ds_2022_010, return = "mismatch")

Combine the individual data sets into a single data frame and remove empty rows and columns. When finished, delete all previous separate data sets as they are no longer required:

ds_all_trips <- rbind(ds_2021_011, ds_2021_012, ds_2022_001, ds_2022_002, ds_2022_003, ds_2022_004, ds_2022_005, ds_2022_006, ds_2022_007, ds_2022_008, ds_2022_009, ds_2022_010)
dim(ds_all_trips)
ds_all_trips <- janitor::remove_empty(ds_all_trips,which = c("cols"))
ds_all_trips <- janitor::remove_empty(ds_all_trips,which = c("rows"))
dim(ds_all_trips)

rm(ds_2021_011,ds_2021_012,ds_2022_001,ds_2022_002,ds_2022_003,ds_2022_004,ds_2022_005,ds_2022_006,ds_2022_007,ds_2022_008,ds_2022_009,ds_2022_010)

Summary of the data structure:

summary(ds_all_trips)

Step 3-5: Clean up and add data to prepare for analysis

Examine the newly created dataset

List of column names:

‌
‌
‌

Cyclistic Case Study (ENG)

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Cyclistic Case Study (ENG)

[1-6] Ask

[2-6] Prepare

[3-6] Process

Libraries Setup

Step 1-5: Data Collection

Step 2-5: Process the data and combine them into a single file

Step 3-5: Clean up and add data to prepare for analysis

Examine the newly created dataset

Cyclistic Case Study (ENG)