Back to Templates
E-Commerce Data
This dataset (source) consists of details of orders made in different countries from December 2010 until December 2011. The company is a UK-based online retailer that mainly sells unique all-occasions gifts. Many of its customers are wholesalers. Take a look at some suggestions to analyze at the end of this template.
# Load packages
import numpy as np
import pandas as pd
Load your data
# Load data from the csv file
df = pd.read_csv('online_retail.csv', index_col=None)
print(f"Number of rows/records: {df.shape[0]}")
print(f"Number of columns/variables: {df.shape[1]}")
df.head()
Number of rows/records: 541909
Number of columns/variables: 8
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 12/1/10 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 12/1/10 8:26 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 12/1/10 8:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 12/1/10 8:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 12/1/10 8:26 | 3.39 | 17850.0 | United Kingdom |
Understand your variables
# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])
for i, var in enumerate(df.columns):
variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
# Join summary data frame with a CSV file explaining the different variables
var_dict = pd.read_csv('variable_explanation.csv', index_col=0)
variables.set_index('Variable').join(var_dict)
Number of unique values | Values | Explanation | |
---|---|---|---|
Variable | |||
InvoiceNo | 25900 | [536365, 536366, 536367, 536368, 536369, 53637... | A 6-digit integral number uniquely assigned to... |
StockCode | 4070 | [85123A, 71053, 84406B, 84029G, 84029E, 22752,... | A 5-digit integral number uniquely assigned to... |
Description | 4223 | [WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET... | Product (item) name |
Quantity | 722 | [6, 8, 2, 32, 3, 4, 24, 12, 48, 18, 20, 36, 80... | The quantities of each product (item) per tran... |
InvoiceDate | 23260 | [12/1/10 8:26, 12/1/10 8:28, 12/1/10 8:34, 12/... | The day and time when each transaction was gen... |
UnitPrice | 1630 | [2.55, 3.39, 2.75, 7.65, 4.25, 1.85, 1.69, 2.1... | Product price per unit in sterling (pound) |
CustomerID | 4372 | [17850.0, 13047.0, 12583.0, 13748.0, 15100.0, ... | A 5-digit integral number uniquely assigned to... |
Country | 38 | [United Kingdom, France, Australia, Netherland... | The name of the country where each customer re... |
Answer interesting questions:
Now you get to explore this exciting dataset! Can't think of where to start? Try your hand at these questions:
- Find out which country has the biggest clients regarding total orders or most money paid.
- Find the smallest portion of customers that drive the biggest portion of the company revenue. See the 80/20 rule.
- Plot the amount of orders made or amount of money paid in function of time.
# Start coding
E-Commerce Data
Analyze order data from an online gift retailer active in multiple countries.