Sales Data Analysis for a Motorcycle Parts Company
You work in the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area. You want to analyze their summer sales data. Your colleague wants to capture sales by payment method. She also needs to know the average unit price for each product line. The company would also like to know if there are any major differences between the sales at the three different warehouses so they can address any underperformance issues.
The sales data has the following fields:
- "date" - The date, from June to August 2021.
- "warehouse" - The company operates three warehouses: North, Central, and West.
- "client_type" - There are two types of customers: Retail and Wholesale.
- "product_line" - Type of products purchased.
- "quantity" - How many items were purchased.
- "unit_price" - Price per item sold.
- "total" - Total sale = quantity * unit_price.
- "payment" - How the client paid: Cash, Credit card, Transfer.
#read in the data
sales_data <- read.csv('./data/sales_data.csv')
head(sales_data)Summary of Findings
The majority of the company's sales in June, July, and August come from transfer and credit card payments. Transfer payments account for the most total sales at $159,642.30 followed by $110,271.60 total sales in credit card payments. Only about 7% of the total sales are from cash payments. The Engine product line has the highest average unit price at about $60. The Breaking System product line has the lowest average unit price at just under $18.
The Central Warehouse had the most summer sales overall, while the West Warehouse had the least sales in both Wholesale and Retail client categories. The sales in each warehouse location are pretty evenly split between the client types, although the West warehouse is the only location where the Retail total sales was slightly higher than the Wholesale total sales. The West warehouse sold about half as many items as the other two locations, which would result in the lowest overall sales. The Central warehouse has increased sales over the summer months, while the West warehouse has seen a decrease in sales.
Total Sales for Each Payment Method
The total sales for each payment method during the months of June, July, and August is shown below. Direct transfers were the most profitable, followed by credit card purchases, and finally cash payments.
tot_sale_by_payment <- sales_data %>% group_by(payment) %>%
summarize(total_sales = sum(total)) %>%
arrange(desc(total_sales))
#formating the display table to show dollars
tot_sale_by_payment2 <- tot_sale_by_payment %>%
mutate(total_sales = dollar(total_sales, largest_with_cents = 5e+05))
tot_sale_by_payment2
#Plotting the Total Sales Data
tot_sales_plot <- ggplot(tot_sale_by_payment, aes(payment, total_sales)) + geom_col() + scale_y_continuous("Total Sales in USD", breaks = seq(0, 150000, 30000)) + labs(title = "Total Sales for Each Payment Method", x = "Payment Method")
tot_sales_plotAverage Unit Price for Each Product Line
The average unit price for each product line is given in the table below, from most to least expensive. Engine products are more expensive on average followed by frame & body products. Breaking system products are the least expensive on average.
avg_unit_price_by_product_line <- sales_data %>% group_by(product_line) %>%
summarize(avg_unit_price = mean(unit_price)) %>%
arrange(desc(avg_unit_price))
#formating the display table to show dollars
avg_unit_price_by_product_line2 <- avg_unit_price_by_product_line %>%
mutate(avg_unit_price = dollar(avg_unit_price))
avg_unit_price_by_product_line2
#Plotting the Average Unit Price Data
avg_unitprice_plot <- ggplot(
avg_unit_price_by_product_line, aes(reorder(product_line,
avg_unit_price), avg_unit_price))
+ geom_col()
+ scale_y_continuous("Average Unit Price in USD", breaks = seq(0, 65, 5))
+ labs(title = "Average Unit Price For Each Product Line", x = "Product
Line")
avg_unitprice_plotUnit Price Spread Boxplot
To further confirm that the Engine product line has the highest unit prices (and not just a few high price items that would skew the average unit price), a boxplot comparing the spread of unit prices in each product line is below.
The boxplot shows that all items in the engine product line are more expensive than items in any of the other product lines. The breaking system products, electrical system products, and the miscellaneous products are all relatively close in price and are significantly less expensive (about 1/3 of the unit price) than engine parts.
#Plotting the Unit Price Data
unitprice_plot <- ggplot(sales_data, aes(reorder(product_line, unit_price), unit_price))
+ geom_boxplot()
+ scale_y_continuous("Unit Price in USD")
+ labs(title="Unit Price Spread For Each Product Line", x="Product Line")
unitprice_plotFurther Investigation into Sales at 3 Different Warehouse Locations
Questions to explore:
- Which warehouse made the most money?
- Which product lines are being sold the most and where?
- How are the sales distributed between retail and wholesale clients at each warehouse?
- Which month was best for sales at each warehouse?
Total Sales and Top Sellers
The Central warehouse location had the top sales in the summer months, both overall and for the wholesale and retail client categories. Wholesale clients brought in more money at the Central and North warehouse locations, but Retail clients resulted in more sales at the West warehouse.
The Central warehouse also sold the most products while the West warehouse sold about one third as many products. Breaking systems were sold the most at the Central and West warehouses but suspension and traction products sold the most at the North warehouse. Overall, Engine products are sold the least at all three warehouses.
# Summarizing total sales by warehouse and client type
totsales_by_warehouse <- sales_data %>% group_by(warehouse, client_type) %>% summarize(total_sales = sum(total))
# Formatting the display table to show dollars
totsales_by_warehouse2 <- totsales_by_warehouse %>% mutate(total_sales = dollar(total_sales))
totsales_by_warehouse2
# Plotting the data
ggplot(totsales_by_warehouse, aes(warehouse, total_sales)) +
geom_col(position = "stack", aes(fill = client_type)) +
scale_y_continuous("Total Sales in USD", breaks = seq(0, 160000, 20000)) +
labs(title = "Total Sales by Warehouse and Client Type", x = "Warehouse Location")sales_data %>% group_by(warehouse) %>% summarize(tot_quantity_sold = sum(quantity))
tot_products <- sales_data %>% group_by(warehouse, product_line) %>%
summarize(tot_quantity = sum(quantity)) %>%
arrange(warehouse, desc(tot_quantity))
tot_products
ggplot(tot_products, aes(product_line, tot_quantity)) +
geom_col() +
scale_y_continuous("Total Quantity Sold") +
labs(title="Total Quantity Sold by Product Line and Warehouse", x = "Product Line") +
facet_grid(~warehouse) +
coord_flip()Month to Month Warehouse Comparison
Summer sales trends vary from warehouse to warehouse. The Central warehouse experienced three good months of increasing sales. The North warehouse had a low sales month in July, but recovered to have their highest sale month yet in August. The West warehouse has seen decreasing sales over the summer months.