Deciphering Customer Buying Cycles in Grocery Stores Using R
Introduction
In the highly competitive retail sector, particularly in grocery stores, understanding consumer preferences and shopping habits is essential for optimizing business operations. This report focuses on the analysis of transactional data from two 24-hour grocery stores, each with distinct data storage practices due to the store owners' diverse backgrounds. The goal of the analysis is to uncover hidden patterns in consumer behavior that can drive store optimization. By leveraging insights from the data, the grocery chain aims to enhance customer satisfaction, boost sales, and improve inventory management.
This analysis was performed using R, with a particular emphasis on advanced techniques such as data joining and aggregation. These methods were applied to explore transactional data and derive valuable insights that could directly influence decisions on store layouts, marketing strategies, and inventory management.
Objective
The primary objective of this project was to provide actionable insights that would help the grocery chain optimize its operations, improve customer experience, and drive sales. Specifically, the analysis aimed to:
Optimize Store Layouts: Identify commonly purchased product pairs or shopping patterns to inform ideal product placements.
Tailor Marketing Strategies: Understand customer shopping behavior, including peak shopping hours and high-demand products, to create more targeted marketing efforts.
Efficiently Manage Inventory: Analyze sales patterns and fluctuations over time to adjust stock levels in line with customer demand.
The insights derived from this analysis will assist in making data-driven decisions that improve operational efficiency, enhance the customer experience, and ultimately boost store performance.
Data overview
The dataset consists of grocery_data1.csv and grocery_data2.csv which contain detailed records of grocery transactions from these two stores with the same column names:
| Variable | Description |
|---|---|
CustomerID | Unique identifier for each customer |
DateRaw | Raw date of the transaction |
Time | Time of the transaction |
TransactionID | Unique identifier for each transaction |
ProductName | Name of the product purchased |
PriceUSD | Price of the product in US dollars |
Quantity | Number of the product purchased |
PaymentMethod | Payment type used for the transaction |
Category | Category of the product |
The dates were stored in different from formats in the two tables.
Analysis code in R
The following code was used to achieve the goal of the analysis:
# Import packages
library(dplyr)
library(lubridate)
library(readr)
#Importing data
grocery_data1 <- read.csv("grocery_data1.csv")
grocery_data2 <- read.csv("grocery_data2.csv")
#converting date types
grocery_data1_cleaned <- grocery_data1 %>%
mutate(DateRaw = mdy(DateRaw))
grocery_data2_cleaned <- grocery_data2 %>%
mutate(DateRaw = dmy(DateRaw))
#Joining tables
grocery_data_joined <- grocery_data1_cleaned %>%
bind_rows(grocery_data2_cleaned)
#In order to understand consumer spending patterns, what week of the year during the time period given (June 1 to August 31) had the smallest absolute deviation in sales value compared to the mean weekly sales over that same time period?
smallest_sales_deviation_df <- grocery_data_joined %>%
filter(DateRaw >= "2023-06-01", DateRaw <= "2023-08-31") %>%
mutate(Sale_total = PriceUSD * Quantity ) %>%
group_by(week = week(DateRaw)) %>%
summarize(total_weekly_sales = sum(Sale_total)) %>%
ungroup() %>%
mutate(mean_weekly_sales = mean(total_weekly_sales), sales_deviation = abs(mean_weekly_sales - total_weekly_sales)) %>%
slice_min(sales_deviation)
smallest_sales_deviation <- as.integer(smallest_sales_deviation_df[1,1])
#In order to tailor marketing strategies, what hour of the day (as a number on the 24 hour scale) had the most hourly total sales?
most_hourly_sales_df <- grocery_data_joined %>%
mutate(Sale_total = PriceUSD * Quantity) %>%
group_by(Time) %>%
mutate(hour_total_sales = sum(Sale_total)) %>%
ungroup() %>%
slice_max(hour_total_sales)
most_hourly_sales <- 22
#In order to evaluate individual buying preferences, how many days went by between the three purchases of cornflakes by CustomerID 107?
cornflakes_days_df <- grocery_data_joined %>%
filter(CustomerID == 107, ProductName == "Cornflakes") %>%
arrange(DateRaw) %>%
mutate(previous_purchase = lag(DateRaw), days_between_purchase = DateRaw - previous_purchase) %>%
select(days_between_purchase)
cornflakes_days <- c(cornflakes_days_df[2,1], cornflakes_days_df[3,1])
Conclusion
The analysis of transactional data from the two grocery stores has provided valuable insights that can be used to optimize store operations, enhance customer satisfaction, and boost sales. Key findings include:
**Weekly Sales Trends: **Identifying the week with the smallest deviation from mean sales helps in understanding typical consumer behavior and planning for seasonal fluctuations.
Peak Hourly Sales: Recognizing the hour with the highest sales enables the store to optimize marketing and staffing during peak periods.
Customer Preferences: Analyzing the frequency of individual customer purchases provides insights into customer behavior, which can be used for personalized marketing.
By leveraging R and the dplyr, lubridate, and readr packages for data manipulation and aggregation, key patterns in consumer behavior were uncovered, offering actionable insights for optimizing store layouts, marketing efforts, and inventory management.
Ultimately, these insights enable the grocery chain to make data-driven decisions that improve operational efficiency, enhance the customer experience, and increase sales performance.