In the bustling world of retail, especially in grocery stores, understanding consumer preferences and shopping habits is pivotal for business optimization. Imagine a local grocery chain seeking to enhance customer satisfaction and drive sales. By analyzing transactional data from two of its 24-hour stores, each with unique data storage practices due to the store owners' diverse backgrounds, the chain aims to uncover hidden patterns in consumer behavior. The goal here is multifaceted: optimizing store layouts, tailoring marketing strategies, and efficiently managing inventory to align with customer preferences. For instance, if analysis reveals that customers frequently purchase certain products together or show a preference for shopping at specific hours, the store can adjust its marketing efforts and stock levels accordingly. Additionally, understanding seasonal trends and weekly fluctuations in sales helps in planning promotions and staffing. Acting as the retail data scientist for this chain, you'll leverage detailed retail data to drive strategic decisions, ultimately enhancing customer experience and boosting the store's performance.
grocery_data1.csv and grocery_data2.csv contain detailed records of grocery transactions from these two stores with the same column names:
| Variable | Description |
|---|---|
CustomerID | Unique identifier for each customer |
DateRaw | Raw date of the transaction |
Time | Time of the transaction |
TransactionID | Unique identifier for each transaction |
ProductName | Name of the product purchased |
PriceUSD | Price of the product in US dollars |
Quantity | Number of the product purchased |
PaymentMethod | Payment type used for the transaction |
Category | Category of the product |
# Import packages
library(dplyr)
library(lubridate)
library(readr)
# Start your code
# Load and parse the time of the grocery datasets
grocery_data1 <- read_csv('grocery_data1.csv') %>%
mutate(Date = mdy(DateRaw))
grocery_data2 <- read_csv('grocery_data2.csv') %>%
mutate(Date = dmy(DateRaw))
grocery_data <- grocery_data1 %>%
bind_rows(grocery_data2)
# Calculate TotalSaleUSD = PriceUSD * Quantity
grocery_data <- grocery_data %>%
mutate(TotalSaleUSD = PriceUSD * Quantity)
# Calculate the time since each customer purchased each ProductName
grocery_data_grouped <- grocery_data %>%
arrange(CustomerID, Date) %>%
group_by(CustomerID, ProductName) %>%
mutate(
DaysSinceLastPurchase = as.numeric(c(0, diff(Date))),
Week = week(Date),
Year = year(Date),
Hour = hour(Time)
) %>%
ungroup()
# Calculate weekly figures for TotalSaleUSD
weekly_sales <- grocery_data_grouped %>%
group_by(Week, Year) %>%
summarize(WeeklyTotalSaleUSD = sum(TotalSaleUSD)) %>%
arrange(Week, Year) %>%
ungroup()
# What week of the year during the time period had the smallest absolute deviation in sales value compared to the mean WeeklyTotalSaleUSD?
mean(weekly_sales$WeeklyTotalSaleUSD)
weekly_sales %>%
mutate(Diff = abs(WeeklyTotalSaleUSD - mean(WeeklyTotalSaleUSD))) %>%
arrange(Diff)
smallest_sales_deviation <- 24
# Calculate hourly figures for TotalSaleUSD
hourly_sales <- grocery_data_grouped %>%
group_by(Hour) %>%
summarize(HourlyTotalSaleUSD = sum(TotalSaleUSD)) %>%
ungroup() %>%
arrange(desc(HourlyTotalSaleUSD))
# What hour of the day (as a number on the 24 hour scale) had the most HourlyTotalSaleUSD?
most_hourly_sales <- 22
# Examine Cornflakes purchases for CustomerID 107
grocery_data_grouped %>%
filter(CustomerID == 107) %>%
filter(ProductName == "Cornflakes") %>%
arrange(desc(DaysSinceLastPurchase))
# How many days went by between the first and second and the second and third transactions of the three purchases of Corn Flakes by CustomerID 107?
cornflakes_days <- c(6, 40)