Back to Templates

E-Commerce Data

This dataset (source) consists of details of orders made in different countries from December 2010 until December 2011. The company is a UK-based online retailer that mainly sells unique all-occasions gifts. Many of its customers are wholesalers. Take a look at some suggestions to analyze at the end of this template.

# Load packages
import numpy as np 
import pandas as pd 

Load your data

# Load data from the csv file
df = pd.read_csv('online_retail.csv', index_col=None)
print(f"Number of rows/records: {df.shape[0]}")
print(f"Number of columns/variables: {df.shape[1]}")
df.head()
Number of rows/records: 541909
Number of columns/variables: 8
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
053636585123AWHITE HANGING HEART T-LIGHT HOLDER612/1/10 8:262.5517850.0United Kingdom
153636571053WHITE METAL LANTERN612/1/10 8:263.3917850.0United Kingdom
253636584406BCREAM CUPID HEARTS COAT HANGER812/1/10 8:262.7517850.0United Kingdom
353636584029GKNITTED UNION FLAG HOT WATER BOTTLE612/1/10 8:263.3917850.0United Kingdom
453636584029ERED WOOLLY HOTTIE WHITE HEART.612/1/10 8:263.3917850.0United Kingdom

Understand your variables

# Understand your variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])

for i, var in enumerate(df.columns):
    variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]

    
# Join summary data frame with a CSV file explaining the different variables
var_dict = pd.read_csv('variable_explanation.csv', index_col=0)
variables.set_index('Variable').join(var_dict)
Number of unique valuesValuesExplanation
Variable
InvoiceNo25900[536365, 536366, 536367, 536368, 536369, 53637...A 6-digit integral number uniquely assigned to...
StockCode4070[85123A, 71053, 84406B, 84029G, 84029E, 22752,...A 5-digit integral number uniquely assigned to...
Description4223[WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET...Product (item) name
Quantity722[6, 8, 2, 32, 3, 4, 24, 12, 48, 18, 20, 36, 80...The quantities of each product (item) per tran...
InvoiceDate23260[12/1/10 8:26, 12/1/10 8:28, 12/1/10 8:34, 12/...The day and time when each transaction was gen...
UnitPrice1630[2.55, 3.39, 2.75, 7.65, 4.25, 1.85, 1.69, 2.1...Product price per unit in sterling (pound)
CustomerID4372[17850.0, 13047.0, 12583.0, 13748.0, 15100.0, ...A 5-digit integral number uniquely assigned to...
Country38[United Kingdom, France, Australia, Netherland...The name of the country where each customer re...

Answer interesting questions:

Now you get to explore this exciting dataset! Can't think of where to start? Try your hand at these questions:

  • Find out which country has the biggest clients regarding total orders or most money paid.
  • Find the smallest portion of customers that drive the biggest portion of the company revenue. See the 80/20 rule.
  • Plot the amount of orders made or amount of money paid in function of time.
# Start coding
Python

E-Commerce Data

Analyze order data from an online gift retailer active in multiple countries.

Use Template