Skip to content

Bellabeat Case Study

Project Overview

About the Company

Bellabeat, a high-tech company that manufactures health-focused smart products for women. Develops beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. It is a successful small company, but they have the potential to become a larger player in the global smart device market.

Key Stakeholders

Urška Sršen:Bellabeat's co-founder and Chief Creative Officer

Sando Mur: Mathematician and Bellabeat's cofounder; key member of the Bellabeat executive team

Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat's marketing strategy

Business Task

In order to identify new growth opportunities for the Bellabeat app, analyzing smart device fitness data trends as it relates to user activity and sleep can be used to gain insights into how consumers are using their smart devices. Being able to answer how consumers use their smart devices to track their overall health will direct marketing strategies for Bellabeat.

Dataset

FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users' habits.

Dataset link available here

To determine the integrity and credibility of the dataset, the principles of ROCCC were followed.

Reliable: Is not reliable. Data sample size consists of 30 participants which may skew the results of the analysis.

Original: Data is not original. It was collected via third party source by distributed surveys via Amazon Mechanical Turk.

Comprehensive: It is comprehensive. The dataset contains all the necessary information required to carry out analysis in regard to the business task.

Current: The data is not current. It was last updated in 2016.

Cited: The data is cited.Furberg, R., Brinton, J., Keating, M., & Ortiz, A. (2016). Crowd-sourced Fitbit datasets 03.12.2016-05.12.2016 [Data set]. Zenodo. Link for citation.

Data Prepartion

Downloaded Fitbit tracker data from Kaggle public dataset onto desktop folder capstone project under Google Certification in folder titled Fitabase Data to unzip csv files to use in RStudio. R packages needed to sort, clean and filter data.

Uploaded CSV files for project from data source: https://www.kaggle.com/arashnic/fitbit

Install and Load Packages

library(tidyverse)

Import data

weightLog <- read.csv('weightLogInfo_merged.csv')
sleepDay<- read.csv('sleepDay_merged.csv')
dailyActivity <- read.csv('dailyActivity_merged.csv')

Explore Data

str(dailyActivity)
str(sleepDay)
str(weightLog)

Data Processing

# Rename date columns
dailyActivity <- dailyActivity %>% rename(Date = ActivityDate)
# Separate date/time columns and rename 
weightLog<-weightLog%>%separate(Date,c("Date","Time"),sep = " ")
sleepDay<-sleepDay%>%separate(SleepDay, c("Date","Time"),sep = " ")
colnames(dailyActivity)
colnames(sleepDay)
colnames(weightLog)
# Converting character Date columns into date
dailyActivity$Date<-mdy(dailyActivity$Date)
sleepDay$Date<-mdy(sleepDay$Date)
weightLog$Date<-mdy(weightLog$Date)

# Confirming conversion 
class(dailyActivity$Date)
class(sleepDay$Date)
class(weightLog$Date)
# Check for missing values in each columns
colSums(is.na(weightLog))
colSums(is.na(sleepDay))
colSums(is.na(dailyActivity))

# Identify duplicate elements
sum(duplicated(dailyActivity))
sum(duplicated(weightLog))
sum(duplicated(sleepDay))

# Remove duplicates
distinct(sleepDay)
#calculating days of the week 
dailyActivity$DayWeek <- weekdays(as.Date(dailyActivity$Date))
sleepDay$DayWeek <- weekdays(as.Date(sleepDay$Date))
weightLog$DayWeek <- weekdays(as.Date(weightLog$Date))
# Summarizing data frames by ID
dailySummary<-dailyActivity%>%group_by(Id)%>%summarise(n=n(),Steps=mean(TotalSteps),Distance=mean(TotalDistance),Very=mean(VeryActiveMinutes),Fairly=mean(FairlyActiveMinutes),Lightly=mean(LightlyActiveMinutes),Sedentary=mean(SedentaryMinutes),Calories=mean(Calories))

sleepSummary<-sleepDay%>%group_by(Id)%>%summarise(n=n(),AvgSleep=mean(TotalMinutesAsleep),AvgTimeInBed=mean(TotalTimeInBed))

weightSummary<-weightLog%>%group_by(Id)%>%summarise(n=n(),WeightLbs=mean(WeightPounds),MinLbs=min(WeightPounds),MaxLbs=max(WeightPounds),AvgBMI=mean(BMI))
# Combining active minutes and total minutes by intensities into new columns for data frames
dailyActivity$TotalActiveMinutes<-rowSums(dailyActivity[,c("VeryActiveMinutes","FairlyActiveMinutes","LightlyActiveMinutes")])

dailyActivity$TotalMinutes<-rowSums(dailyActivity[,c("VeryActiveMinutes","FairlyActiveMinutes","LightlyActiveMinutes","SedentaryMinutes")])

dailySummary$Active<-rowSums(dailySummary[,c("Very","Lightly","Fairly")])
dailySummary$TotalMinutes<-rowSums(dailySummary[,c("Very","Fairly","Lightly","Sedentary")])