๐ Background
You work in the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.
Youโve recently learned data manipulation and plotting, and suggest helping your colleague analyze past sales data. Your colleague wants to capture sales by payment method. She also needs to know the average unit price for each product line.
๐พ The data
The team assembled the following file:
The sales data has the following fields:
- "date" - The date, from June to August 2021.
- "warehouse" - The company operates three warehouses: North, Central, and West.
- "client_type" - There are two types of customers: Retail and Wholesale.
- "product_line" - Type of products purchased.
- "quantity" - How many items were purchased.
- "unit_price" - Price per item sold.
- "total" - Total sale = quantity * unit_price.
- "payment" - How the client paid: Cash, Credit card, Transfer.
โ
Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.Rmd.
- Remove redundant cells like the introduction to R Markdown notebooks, so the workbook is focused on your story
- Check that all the cells run without error.
๐ Let's begin the Analysis
Loading the packages needed.
library(tidyverse)
library(knitr)Importing data to R's workspace.
sales_data <- read_csv("data/sales_data.csv")What are the total sales for each payment method?
We want to find what is the payment method that has largest sales. We simply plot a bar chart comparing sales between cash, credit card and transfer payment methods.
sales_per_payment <- sales_data %>%
group_by(payment) %>%
summarise(total_sales = sum(total)) %>%
arrange(desc(total_sales))
sales_per_payment %>%
mutate(total_sales_label = paste0("US$",
format(total_sales, big.mark = ","),
"0")) %>%
ggplot(aes(payment, total_sales, fill = payment)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = total_sales_label), nudge_y = 10000) +
scale_fill_brewer(palette = "Dark2") +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Total Sales for each Payment Method",
x = NULL,
y = NULL) +
theme_classic()The chart shows us that Cash is least used and transfer and credit card are the best options to invest considering the amount of sales. But maybe the payment method differs in each warehouse. Le's check it out.
sales_per_payment_warehouse <- sales_data %>%
group_by(payment, warehouse) %>%
summarise(total_sales = sum(total)) %>%
arrange(desc(total_sales))
sales_per_payment_warehouse %>%
mutate(total_sales_label = paste0("US$",
format(total_sales, big.mark = ","),
"0")) %>%
ggplot(aes(payment, total_sales, fill = payment)) +
geom_col(show.legend = FALSE) +
facet_grid(~warehouse) +
scale_fill_brewer(palette = "Dark2") +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Total Sales for each Payment Method",
x = NULL,
y = NULL) +
theme_classic()The order of importance of the payment methods are equall in each warehouse, but we can observe a difference in intensity between these differences, especially in the West warehouse.
What is the average unit price for each product line?
Now we want to find out the average price for each product line. We do this in the chart below.
price_per_product <- sales_data %>%
group_by(product_line) %>%
summarise(avg_price = round(mean(unit_price), digits = 2)) %>%
arrange(desc(avg_price)) %>%
mutate(product_line = reorder(product_line, avg_price))
price_per_product %>%
mutate(avg_price_label = paste0("US$",
format(avg_price, big.mark = ","))) %>%
ggplot(aes(avg_price, product_line, fill = product_line)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = avg_price_label), nudge_x = 9) +
coord_cartesian(xlim = c(0, 75))+
scale_x_continuous(labels = scales::dollar,
position = "top") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Average Price per product Line",
x = NULL,
y = NULL) +
theme_classic()We see that the Engine product line has the largest average price. We could go further! Does the most expensive product line has the largest amount of sales?
sales_per_product <- sales_data %>%
group_by(product_line) %>%
summarise(avg_price = round(mean(unit_price), digits = 2),
avg_sales = round(mean(quantity), digits = 2)) %>%
arrange(desc(avg_price)) %>%
mutate(product_line = reorder(product_line, avg_price))
sales_per_product %>%
ggplot(aes(avg_sales, product_line, fill = product_line)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = avg_sales), nudge_x = 1) +
scale_x_continuous(position = "top") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Average Sales per product Line",
x = NULL,
y = NULL) +
theme_classic()
We see that even though Engine has the largest average price, it has the largest sales (normally, high price leads to low sales). Then, it is a good strategy to focus in increasing Engine sales.
Moreover, we could improve the chart above by faceting it by region.
sales_per_product_warehouse <- sales_data %>%
group_by(product_line) %>%
mutate(avg_price = round(mean(unit_price), digits = 2)) %>%
ungroup() %>%
group_by(product_line, warehouse, avg_price) %>%
summarise(avg_sales_wh = round(mean(quantity), digits = 1)) %>%
arrange(desc(avg_price)) %>%
ungroup() %>%
mutate(product_line = reorder(product_line, avg_price))
sales_per_product_warehouse %>%
ggplot(aes(avg_sales_wh, product_line, fill = product_line)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = avg_sales_wh), nudge_x = 1) +
facet_grid(rows = vars(warehouse)) +
scale_x_continuous(position = "top") +
scale_fill_brewer(palette = "Set2") +
labs(title = "Average Sales per product Line, per Warehouse",
x = NULL,
y = NULL) +
theme_classic()
โ
โ