Skip to content

UPI Transactions

Source: https://www.kaggle.com/datasets/skullagos5246/upi-transactions-2024-dataset

Overview:

This dataset simulates realistic Unified Payments Interface (UPI) transactions across various merchant categories in India. It is designed to reflect real-world digital payment patterns, including temporal, demographic, and behavioural features relevant for fraud detection, consumer analytics, and business intelligence.

Problem Statement:

  1. Display 5 sample transactions.
  2. Display the first 20 transactions.
  3. Fetch the transactions worth less than 100 INR. Display only the amount and merchant category.
  4. How many transactions have happened in total?
  5. How many transactions have happened in Karnataka?
  6. How many transactions happened in July?
  7. What is the average value of all transactions?
  8. Display the names of all the banks that initiated the transaction.
  9. How many unique types of transactions have been used?
  10. Identify all the fraudulent transactions on weekends after midnight and before sunrise. Assume sunrise is at 6 a.m.
  11. Identify successful transactions initiated by senior citizens. Assume a senior citizen's age starts at 55.
  12. How many failed grocery or shopping transactions have happened between midnight and sunrise on weekdays? Assume sunrise is at 5 a.m.
  13. Identify all transactions involving recharge or bill payments at the SBI bank.
  14. Which device type has had the most transactions?
  15. Fetch the total transaction amount per sender bank.
import pandas as pd
upi_transactions_2024 = pd.read_csv('upi_transactions_2024.csv')
upi_transactions_2024.head()

upi_transactions_2024.columns = upi_transactions_2024.columns.str.replace(' ', '_').str.replace(r'[()]+', '', regex=True)
upi_transactions_2024.head()

upi_transactions_2024.to_csv("cleaned_upi_transactions_2024.csv", index=False)
import pandas as pd
cleaned_upi_transactions_2024 = pd.read_csv('cleaned_upi_transactions_2024.csv')
cleaned_upi_transactions_2024.head()
Spinner
DataFrameas
df
variable
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
LIMIT 5
Spinner
DataFrameas
df1
variable
-- 1.Display 5 sample transactions.
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
LIMIT 5;
Spinner
DataFrameas
df2
variable
-- 2.Display the first 20 transactions.
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
ORDER BY timestamp
LIMIT 20;
Spinner
DataFrameas
df3
variable
--3.Fetch the transactions worth less than 100 INR. Display only the amount and merchant category.
SELECT amount_INR, merchant_category
FROM 'cleaned_upi_transactions_2024.csv'
WHERE amount_INR < 100;
Spinner
DataFrameas
df5
variable
-- 4.How many transactions have happened in total?
SELECT COUNT(*) AS Total_Trns
FROM 'cleaned_upi_transactions_2024.csv';
Spinner
DataFrameas
df6
variable
-- 5.How many transactions have happened in Karnataka?
SELECT COUNT(TRANSACTION_ID) AS karnataka_trns
FROM 'cleaned_upi_transactions_2024.csv'
WHERE SENDER_STATE = 'Karnataka';
Spinner
DataFrameas
df7
variable
--6. How many transactions happened in July? 
--count of aggregation function and timestamp
SELECT COUNT(*) 
FROM 'cleaned_upi_transactions_2024.csv'
WHERE MONTH(timestamp)= 07;
Spinner
DataFrameas
df8
variable
--7. What is the average value of all transactions?
SELECT AVG(amount_INR)
FROM 'cleaned_upi_transactions_2024.csv';
Spinner
DataFrameas
df4
variable
--8. Display the name of all banks that initiated the transaction?
SELECT DISTINCT sender_bank
FROM 'cleaned_upi_transactions_2024.csv';
Spinner
DataFrameas
df9
variable
-- 9. How many unique types of transactions have been used?
SELECT COUNT(DISTINCT TRANSACTION_TYPE)
FROM 'cleaned_upi_transactions_2024.csv';
Spinner
DataFrameas
df15
variable
-- 10. Identify all the fraudulent transactions on weekends after midnight and before sunrise. Assume sunrise is at 6 a.m.
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
WHERE fraud_flag = '1'
AND is_weekend = '1'
AND hour_of_day BETWEEN '0' AND '5';