Bellabeat Case Study
Project Overview
About the Company
Bellabeat, a high-tech company that manufactures health-focused smart products for women. Develops beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. It is a successful small company, but they have the potential to become a larger player in the global smart device market.
Key Stakeholders
Urška Sršen:Bellabeat's co-founder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat's cofounder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat's marketing strategy
Business Task
In order to identify new growth opportunities for the Bellabeat app, analyzing smart device fitness data trends as it relates to user activity and sleep can be used to gain insights into how consumers are using their smart devices. Being able to answer how consumers use their smart devices to track their overall health will direct marketing strategies for Bellabeat.
Dataset
FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users' habits.
Dataset link available here
To determine the integrity and credibility of the dataset, the principles of ROCCC were followed.
Reliable: Is not reliable. Data sample size consists of 30 participants which may skew the results of the analysis.
Original: Data is not original. It was collected via third party source by distributed surveys via Amazon Mechanical Turk.
Comprehensive: It is comprehensive. The dataset contains all the necessary information required to carry out analysis in regard to the business task.
Current: The data is not current. It was last updated in 2016.
Cited: The data is cited.Furberg, R., Brinton, J., Keating, M., & Ortiz, A. (2016). Crowd-sourced Fitbit datasets 03.12.2016-05.12.2016 [Data set]. Zenodo. Link for citation.
Data Prepartion
Downloaded Fitbit tracker data from Kaggle public dataset onto desktop folder capstone project under Google Certification in folder titled Fitabase Data to unzip csv files to use in RStudio. R packages needed to sort, clean and filter data.
Uploaded CSV files for project from data source: https://www.kaggle.com/arashnic/fitbit
Install and Load Packages
library(tidyverse)Import data
weightLog <- read.csv('weightLogInfo_merged.csv')
sleepDay<- read.csv('sleepDay_merged.csv')
dailyActivity <- read.csv('dailyActivity_merged.csv')
Explore Data
str(dailyActivity)
str(sleepDay)
str(weightLog)Data Processing
# Rename date columns
dailyActivity <- dailyActivity %>% rename(Date = ActivityDate)# Separate date/time columns and rename
weightLog<-weightLog%>%separate(Date,c("Date","Time"),sep = " ")
sleepDay<-sleepDay%>%separate(SleepDay, c("Date","Time"),sep = " ")colnames(dailyActivity)
colnames(sleepDay)
colnames(weightLog)# Converting character Date columns into date
dailyActivity$Date<-mdy(dailyActivity$Date)
sleepDay$Date<-mdy(sleepDay$Date)
weightLog$Date<-mdy(weightLog$Date)
# Confirming conversion
class(dailyActivity$Date)
class(sleepDay$Date)
class(weightLog$Date)# Check for missing values in each columns
colSums(is.na(weightLog))
colSums(is.na(sleepDay))
colSums(is.na(dailyActivity))
# Identify duplicate elements
sum(duplicated(dailyActivity))
sum(duplicated(weightLog))
sum(duplicated(sleepDay))
# Remove duplicates
distinct(sleepDay)#calculating days of the week
dailyActivity$DayWeek <- weekdays(as.Date(dailyActivity$Date))
sleepDay$DayWeek <- weekdays(as.Date(sleepDay$Date))
weightLog$DayWeek <- weekdays(as.Date(weightLog$Date))# Summarizing data frames by ID
dailySummary<-dailyActivity%>%group_by(Id)%>%summarise(n=n(),Steps=mean(TotalSteps),Distance=mean(TotalDistance),Very=mean(VeryActiveMinutes),Fairly=mean(FairlyActiveMinutes),Lightly=mean(LightlyActiveMinutes),Sedentary=mean(SedentaryMinutes),Calories=mean(Calories))
sleepSummary<-sleepDay%>%group_by(Id)%>%summarise(n=n(),AvgSleep=mean(TotalMinutesAsleep),AvgTimeInBed=mean(TotalTimeInBed))
weightSummary<-weightLog%>%group_by(Id)%>%summarise(n=n(),WeightLbs=mean(WeightPounds),MinLbs=min(WeightPounds),MaxLbs=max(WeightPounds),AvgBMI=mean(BMI))
# Combining active minutes and total minutes by intensities into new columns for data frames
dailyActivity$TotalActiveMinutes<-rowSums(dailyActivity[,c("VeryActiveMinutes","FairlyActiveMinutes","LightlyActiveMinutes")])
dailyActivity$TotalMinutes<-rowSums(dailyActivity[,c("VeryActiveMinutes","FairlyActiveMinutes","LightlyActiveMinutes","SedentaryMinutes")])
dailySummary$Active<-rowSums(dailySummary[,c("Very","Lightly","Fairly")])
dailySummary$TotalMinutes<-rowSums(dailySummary[,c("Very","Fairly","Lightly","Sedentary")])