Skip to content

Point of Sale(POS) Data from a Supermarket: Transactions and Cashier Operations

Retailers/Businesses need to understand their checkout data to plan effectively and potentially increase profit.

We explore questions like

  • Do people spend more with card or cash?
  • How many items do people buy at different times?
  • Are self-service checkouts quicker than those with a cashier?
  • What is the trend in amount across days?
  • Should stores be open on Sundays?
  • and more

Data from transaction data from a Polish Supermarket

We will use SQL for queries and Plotly for visualization.

Importing libraries that we may use.

import plotly.express as px
import numpy as np

Task 1: Exploring the data

A good first step in exploring a database is to select the first few rows from each table.

Spinner
DataFrameas
df
variable
SELECT *
FROM 'pos_operator_logs.csv'
LIMIT 10
Spinner
DataFrameas
df
variable
SELECT *
FROM 'pos_transactions.csv'
LIMIT 10

Task 2: Do more people make transactions by card or by cash?

Here we look to understand if people prefer to make transactions from their card or cash.

Spinner
DataFrameas
df1
variable
SELECT 
SUM(t_cash::numeric) AS trans_cash,
SUM(t_card::numeric) AS trans_card
FROM 'pos_transactions.csv'

More transactions were done using cash than with card. Let's visualise this using pie plot

px.pie(df1,values= df1.iloc[0],names=['Trans_cash','Trans_card'],
       title='RATIO OF TRANSACTION OF CASH AND CARD',
      color_discrete_sequence=px.colors.sequential.Jet)

Task 3: Do people spend more per transaction when using cash or card?

Similarly, knowing how much people spend per transaction using different payment methods is helpful for deciding on security arrangments, and marketing to encourage shoppers to use different payment methods.

Spinner
DataFrameas
df2
variable
SELECT 
AVG(CASE WHEN t_cash == 'True' THEN amount END) as avg_amount_cash,
AVG(CASE WHEN t_card == 'True' THEN amount END) AS avg_amount_card
FROM 'pos_transactions.csv'