Skip to content

E-Commerce Data

This dataset consists of orders made in different countries from December 2010 to December 2011. The company is a UK-based online retailer that mainly sells unique all-occasion gifts. Many of its customers are wholesalers.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd
import numpy as np
retail = pd.read_csv("online_retail.csv")
retail.head(100)

Data Dictionary

VariableExplanation
InvoiceNoA 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c' it indicates a cancellation.
StockCodeA 5-digit integral number uniquely assigned to each distinct product.
DescriptionProduct (item) name
QuantityThe quantities of each product (item) per transaction
InvoiceDateThe day and time when each transaction was generated
UnitPriceProduct price per unit in sterling (pound)
CustomerIDA 5-digit integral number uniquely assigned to each customer
CountryThe name of the country where each customer resides

Source of dataset.

Citation: Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • πŸ—ΊοΈ Explore: Negative order quantities indicate returns. Which products have been returned the most?
  • πŸ“Š Visualize: Create a plot visualizing the profits earned from UK customers weekly, monthly, and quarterly.
  • πŸ”Ž Analyze: Are order sizes from countries outside the United Kingdom significantly larger than orders from inside the United Kingdom?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You are working for an online retailer. Currently, the retailer sells over 4000 unique products. To take inventory of the items, your manager has asked you whether you can group the products into a small number of categories. The categories should be similar in terms of price and quantity sold and any other characteristics you can extract from the data.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.

TotalPrice Calculated

# Assuming 'retail' DataFrame has columns 'Quantity' and 'UnitPrice'
retail['TotalPrice'] = retail['Quantity'] * retail['UnitPrice']
retail.head()

Most returned products value

returned = retail[retail["Quantity"] < 0]
returned
# returned_agg = returned.agg([np.sum])

Returns by Country

    returned.pivot_table(values=["TotalPrice"], index=["CustomerID", "Country"], aggfunc=np.sum)

Worth to check what is happening on the United Kingdom Market

Most Returns

To check:

  • Income in period
  • Share of return in total income per customer
  • Total return value per customer

Top ten best customers