Market Basket Analysis
Given a dataset of customer transactions, where each transaction is a set of items, Market Basket Analysis (MBA) finds a group of items that are frequently purchased together. It is helpful in identifying supplementary products that are not similar. The outcome of MBA will be a recommendation of the type: "Item A is often purchased together with item B, consider crossselling ..."
Market Basket Analysis can be used to:
- Build a movie/song recommendation engine
- Build a live recommendation algorithm on an e-commerce store
- Cross-sell or Upsell products in a supermarket
%%capture !pip install mlxtend
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from mlxtend.frequent_patterns import association_rules, apriori from mlxtend.preprocessing import TransactionEncoder from pandas.plotting import parallel_coordinates
Matplotlib is building the font cache; this may take a moment.
1. Load your data
# Upload your data as a CSV file. df = pd.read_csv('example.csv') df.head()
2. Set parameters
# Set parameters to use for the analysis. MIN_SUPPORT = 0.001 # Set minimum value to accept for the support metric MAX_LEN = 3 # Set max transaction length to consider METRIC = "lift" # Metric for association rule creation MIN_THRESHOLD = 1 # Threshold for association rule creation
3. Derive Rules
Create a table with antecedents, their consequents and all important metrics
# Get all the transcactions as a list transcactions = list(df['Transaction'].apply(lambda x: sorted(x.split(',')))) # Instantiate transcation encoder encoder = TransactionEncoder().fit(transcactions) onehot = encoder.transform(transcactions) # Convert one-hot encode data to DataFrame onehot = pd.DataFrame(onehot, columns=encoder.columns_) # Compute frequent items using the Apriori algorithm - frequent_itemsets = apriori(onehot, min_support = MIN_SUPPORT, max_len = MAX_LEN, use_colnames = True) rules = association_rules(frequent_itemsets, metric = METRIC, min_threshold = MIN_THRESHOLD) rules.head()
|antecedents||consequents||antecedent support||consequent support||support||confidence||lift||leverage||conviction|
4. Visualize as Heatmap
Visually identify the most promising antecedents and consequents to analyze.
# General Strategy: # 1. Generate the rules # 2. Convert antecedents and consequents into rules # 3. Convert rules into matrix format rules['lhs items'] = rules['antecedents'].apply(lambda x:len(x) ) # Replace frozen sets with strings rules['antecedents_'] = rules['antecedents'].apply(lambda a: ','.join(list(a))) rules['consequents_'] = rules['consequents'].apply(lambda a: ','.join(list(a))) # Transform the DataFrame of rules into a matrix using the lift metric pivot = rules[rules['lhs items']>0].pivot(index = 'antecedents_', columns = 'consequents_', values= 'lift') # Generate a heatmap with annotations on and the colorbar off sns.heatmap(pivot, annot = True) plt.yticks(rotation=0) plt.xticks(rotation=90) plt.show()
5. Visualize as Parallel Coordinates Plot
Visualize interdependencies of items through connecting lines.
# Generate frequent itemsets frequent_itemsets = apriori(onehot, min_support = 0.10, use_colnames = True, max_len = 2) # Generate association rules rules = association_rules(frequent_itemsets, metric = 'support', min_threshold = 0.00)
# Function to convert rules to coordinates. def rules_to_coordinates(rules): rules['antecedent'] = rules['antecedents'].apply(lambda antecedent: list(antecedent)) rules['consequent'] = rules['consequents'].apply(lambda consequent: list(consequent)) rules['rule'] = rules.index return rules[['antecedent','consequent','rule']] # Generate frequent itemsets frequent_itemsets = apriori(onehot, min_support = 0.01, use_colnames = True, max_len = 2) # Generate association rules rules = association_rules(frequent_itemsets, metric = 'lift', min_threshold = 1.00) # Generate coordinates and print example coords = rules_to_coordinates(rules) # Generate parallel coordinates plot parallel_coordinates(coords, 'rule');
- Apriori Algorithm
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction. For example, we can extract information on purchasing behavior like “ If someone buys beer and sausage, then is likely to buy mustard with high probability “ Let’s define the main Associaton Rules:
It calculates how often the product is purchased and is given by the formula:
It measures how often items in Y appear in transactions that contain X and is given by the formula.
It is the value that tells us how likely item Y is bought together with item X. Values greater than one indicate that the items are likely to be purchased together. When lift > 1 then the rule is better at predicting the result than guessing. When lift < 1, the rule is doing worse than informed guessing.
- Our transactions are lists of comma separated items
- We need to get our data in a onehot encoded format.
Market Basket Analysis
Find groups of items that are frequently purchased together.