Skip to content

Deciphering Customer Buying Cycles in Grocery Stores Using R

Introduction

In the highly competitive retail sector, particularly in grocery stores, understanding consumer preferences and shopping habits is essential for optimizing business operations. This report focuses on the analysis of transactional data from two 24-hour grocery stores, each with distinct data storage practices due to the store owners' diverse backgrounds. The goal of the analysis is to uncover hidden patterns in consumer behavior that can drive store optimization. By leveraging insights from the data, the grocery chain aims to enhance customer satisfaction, boost sales, and improve inventory management.

This analysis was performed using R, with a particular emphasis on advanced techniques such as data joining and aggregation. These methods were applied to explore transactional data and derive valuable insights that could directly influence decisions on store layouts, marketing strategies, and inventory management.

Objective

The primary objective of this project was to provide actionable insights that would help the grocery chain optimize its operations, improve customer experience, and drive sales. Specifically, the analysis aimed to:

Optimize Store Layouts: Identify commonly purchased product pairs or shopping patterns to inform ideal product placements.

Tailor Marketing Strategies: Understand customer shopping behavior, including peak shopping hours and high-demand products, to create more targeted marketing efforts.

Efficiently Manage Inventory: Analyze sales patterns and fluctuations over time to adjust stock levels in line with customer demand.

The insights derived from this analysis will assist in making data-driven decisions that improve operational efficiency, enhance the customer experience, and ultimately boost store performance.

Data overview

The dataset consists of grocery_data1.csv and grocery_data2.csv which contain detailed records of grocery transactions from these two stores with the same column names:

VariableDescription
CustomerIDUnique identifier for each customer
DateRawRaw date of the transaction
TimeTime of the transaction
TransactionIDUnique identifier for each transaction
ProductNameName of the product purchased
PriceUSDPrice of the product in US dollars
QuantityNumber of the product purchased
PaymentMethodPayment type used for the transaction
CategoryCategory of the product

The dates were stored in different from formats in the two tables.

Analysis code in R

The following code was used to achieve the goal of the analysis:

# Import packages
library(dplyr)
library(lubridate)
library(readr)

#Importing data
grocery_data1 <- read.csv("grocery_data1.csv")
grocery_data2 <- read.csv("grocery_data2.csv")


#converting date types 
grocery_data1_cleaned  <- grocery_data1 %>%
mutate(DateRaw = mdy(DateRaw))

grocery_data2_cleaned  <- grocery_data2 %>%
mutate(DateRaw = dmy(DateRaw))

#Joining tables
grocery_data_joined <- grocery_data1_cleaned %>%
bind_rows(grocery_data2_cleaned)



#In order to understand consumer spending patterns, what week of the year during the time period given (June 1 to August 31) had the smallest absolute deviation in sales value compared to the mean weekly sales over that same time period? 

smallest_sales_deviation_df <- grocery_data_joined %>%
filter(DateRaw >= "2023-06-01", DateRaw <= "2023-08-31") %>%
mutate(Sale_total = PriceUSD * Quantity ) %>%
group_by(week = week(DateRaw)) %>%
summarize(total_weekly_sales = sum(Sale_total)) %>%
ungroup() %>%
mutate(mean_weekly_sales = mean(total_weekly_sales), sales_deviation = abs(mean_weekly_sales - total_weekly_sales)) %>%
slice_min(sales_deviation)
		 
smallest_sales_deviation <- as.integer(smallest_sales_deviation_df[1,1])



#In order to tailor marketing strategies, what hour of the day (as a number on the 24 hour scale) had the most hourly total sales? 
most_hourly_sales_df <- grocery_data_joined %>%
mutate(Sale_total = PriceUSD * Quantity) %>%
group_by(Time) %>%
mutate(hour_total_sales = sum(Sale_total)) %>%
ungroup() %>%
slice_max(hour_total_sales)


most_hourly_sales <- 22

#In order to evaluate individual buying preferences, how many days went by between the three purchases of cornflakes by CustomerID 107?

cornflakes_days_df <- grocery_data_joined  %>%
filter(CustomerID == 107, ProductName == "Cornflakes") %>%
arrange(DateRaw) %>%
mutate(previous_purchase = lag(DateRaw), days_between_purchase = DateRaw - previous_purchase) %>%
select(days_between_purchase)

cornflakes_days <- c(cornflakes_days_df[2,1], cornflakes_days_df[3,1])


Conclusion

The analysis of transactional data from the two grocery stores has provided valuable insights that can be used to optimize store operations, enhance customer satisfaction, and boost sales. Key findings include:

**Weekly Sales Trends: **Identifying the week with the smallest deviation from mean sales helps in understanding typical consumer behavior and planning for seasonal fluctuations.

Peak Hourly Sales: Recognizing the hour with the highest sales enables the store to optimize marketing and staffing during peak periods.

Customer Preferences: Analyzing the frequency of individual customer purchases provides insights into customer behavior, which can be used for personalized marketing.

By leveraging R and the dplyr, lubridate, and readr packages for data manipulation and aggregation, key patterns in consumer behavior were uncovered, offering actionable insights for optimizing store layouts, marketing efforts, and inventory management.

Ultimately, these insights enable the grocery chain to make data-driven decisions that improve operational efficiency, enhance the customer experience, and increase sales performance.