E-Commerce Data
This dataset consists of orders made in different countries from December 2010 to December 2011. The company is a UK-based online retailer that mainly sells unique all-occasion gifts. Many of its customers are wholesalers.
import pandas as pd
df=pd.read_csv("online_retail.csv")
dfData Dictionary
| Variable | Explanation |
|---|---|
| InvoiceNo | A 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c' it indicates a cancellation. |
| StockCode | A 5-digit integral number uniquely assigned to each distinct product. |
| Description | Product (item) name |
| Quantity | The quantities of each product (item) per transaction |
| InvoiceDate | The day and time when each transaction was generated |
| UnitPrice | Product price per unit in sterling (pound) |
| CustomerID | A 5-digit integral number uniquely assigned to each customer |
| Country | The name of the country where each customer resides |
Source of dataset.
Citation: Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).
Object and Motivation
My main objetives are to to develop a Cohort Analysis an a Cluster Segmentation of Clients according to a RFM model. RFM stands for Recency-Frequency-Monetary.
- Recency: How much time has elapsed since a customer’s last activity or transaction with the brand? Activity is usually a purchase, although variations are sometimes used, e.g., the last visit to a website or use of a mobile app. In most cases, the more recently a customer has interacted or transacted with a brand, the more likely that customer will be responsive to communications from the brand.
- Frequency: How often has a customer transacted or interacted with the brand during a particular period of time? Clearly, customers with frequent activities are more engaged, and probably more loyal, than customers who rarely do so. And one-time-only customers are in a class of their own.
- Monetary: Also referred to as “monetary value,” this factor reflects how much a customer has spent with the brand during a particular period of time. Big spenders should usually be treated differently than customers who spend little. Looking at monetary divided by frequency indicates the average purchase amount – an important secondary factor to consider when segmenting customers.
Exploratory Data Analysis and Cleaning Process
Basic Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import missingno as msno
%matplotlib inlineData types
df.dtypesStatistics Description
df.describe()Observation: There are negative quantities and negative prices. This quantities are mostly asociated with cancelled orders
df.describe(include = 'object')