Data analysis example:
Find the total sales for each warehouse.
We can use groupby
to group the information by the column "warehouse". Then we select the column "total" and use .sum()
to add the "total" column for each warehouse:
# Importing the pandas module
import pandas as pd
# Reading in the sales data
df = pd.read_csv('data/sales_data.csv', parse_dates=['date'])
# Take a look at the first datapoints
df.head()
df.groupby('warehouse')[['total']].sum()
Data science notebooks & visualizations
Visualizations are very helpful to summarize data and gain insights. A well-crafted chart often conveys information much better than a table.
It is very straightforward to include plots in a data science notebook. For example, let's look at the average number of items purchased by each client type.
We are using the matplotlib.pyplot
library for this example. We will run the .plot()
method on the data we want to display and call plt.show()
to draw the plot:
import matplotlib.pyplot as plt
avg_units_client_type = df.groupby('client_type')['quantity'].mean()
avg_units_client_type.plot(kind='barh')
plt.show()
Reporting on sales data
Now let's now move on to the competition and challenge.
📖 Background
You work in the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.
You’ve recently learned data manipulation and plotting, and suggest helping your colleague analyze past sales data. Your colleague wants to capture sales by payment method. She also needs to know the average unit price for each product line.
💾 The data
The sales data has the following fields:
- "date" - The date, from June to August 2021.
- "warehouse" - The company operates three warehouses: North, Central, and West.
- "client_type" - There are two types of customers: Retail and Wholesale.
- "product_line" - Type of products purchased.
- "quantity" - How many items were purchased.
- "unit_price" - Price per item sold.
- "total" - Total sale = quantity * unit_price.
- "payment" - How the client paid: Cash, Credit card, Transfer.
df.head()
💪 Challenge
Create a report to answer your colleague's questions. Include:
- What are the total sales for each payment method?
- What is the average unit price for each product line?
- Create plots to visualize findings for questions 1 and 2.
- [Optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
- Summarize your findings.
✅ Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.
⌛️ Time is ticking. Good luck!
#Total sales for each payment method
total_sales_by_payment_method = df.groupby('payment')['total'].sum()
total_sales_by_payment_method.head()
#plot to visualize total sales
total_sales_by_payment_method.plot(kind = 'bar', ylabel='Total Sales', title = 'Total Sales for each Payment Method')
plt.show()
#Average unit price for each product line
average_price_by_product_line = df.groupby('product_line')['unit_price'].mean()
average_price_by_product_line.head()
#plot to visualize average unit price
average_price_by_product_line.plot(kind='bar', ylabel='Unit Price', title='Average Unit Price for each Product Line')
plt.show()