Skip to content
UPI Transactions (Snowflake)
UPI Transactions
Source: https://www.kaggle.com/datasets/skullagos5246/upi-transactions-2024-dataset
Overview:
This dataset simulates realistic Unified Payments Interface (UPI) transactions across various merchant categories in India. It is designed to reflect real-world digital payment patterns, including temporal, demographic, and behavioural features relevant for fraud detection, consumer analytics, and business intelligence.
Problem Statement:
- Display 5 sample transactions.
- Display the first 20 transactions.
- Fetch the transactions worth less than 100 INR. Display only the amount and merchant category.
- How many transactions have happened in total?
- How many transactions have happened in Karnataka?
- How many transactions happened in July?
- What is the average value of all transactions?
- Display the names of all the banks that initiated the transaction.
- How many unique types of transactions have been used?
- Identify all the fraudulent transactions on weekends after midnight and before sunrise. Assume sunrise is at 6 a.m.
- Identify successful transactions initiated by senior citizens. Assume a senior citizen's age starts at 55.
- How many failed grocery or shopping transactions have happened between midnight and sunrise on weekdays? Assume sunrise is at 5 a.m.
- Identify all transactions involving recharge or bill payments at the SBI bank.
- Which device type has had the most transactions?
- Fetch the total transaction amount per sender bank.
import pandas as pd
upi_transactions_2024 = pd.read_csv('upi_transactions_2024.csv')
upi_transactions_2024.head()
upi_transactions_2024.columns = upi_transactions_2024.columns.str.replace(' ', '_').str.replace(r'[()]+', '', regex=True)
upi_transactions_2024.head()
upi_transactions_2024.to_csv("cleaned_upi_transactions_2024.csv", index=False)
import pandas as pd
cleaned_upi_transactions_2024 = pd.read_csv('cleaned_upi_transactions_2024.csv')
cleaned_upi_transactions_2024.head()DataFrameas
df
variable
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
LIMIT 5DataFrameas
df1
variable
-- 1.Display 5 sample transactions.
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
LIMIT 5;
DataFrameas
df2
variable
-- 2.Display the first 20 transactions.
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
ORDER BY timestamp
LIMIT 20;
DataFrameas
df3
variable
--3.Fetch the transactions worth less than 100 INR. Display only the amount and merchant category.
SELECT amount_INR, merchant_category
FROM 'cleaned_upi_transactions_2024.csv'
WHERE amount_INR < 100;
DataFrameas
df5
variable
-- 4.How many transactions have happened in total?
SELECT COUNT(*) AS Total_Trns
FROM 'cleaned_upi_transactions_2024.csv';DataFrameas
df6
variable
-- 5.How many transactions have happened in Karnataka?
SELECT COUNT(TRANSACTION_ID) AS karnataka_trns
FROM 'cleaned_upi_transactions_2024.csv'
WHERE SENDER_STATE = 'Karnataka';DataFrameas
df7
variable
--6. How many transactions happened in July?
--count of aggregation function and timestamp
SELECT COUNT(*)
FROM 'cleaned_upi_transactions_2024.csv'
WHERE MONTH(timestamp)= 07;DataFrameas
df8
variable
--7. What is the average value of all transactions?
SELECT AVG(amount_INR)
FROM 'cleaned_upi_transactions_2024.csv';DataFrameas
df4
variable
--8. Display the name of all banks that initiated the transaction?
SELECT DISTINCT sender_bank
FROM 'cleaned_upi_transactions_2024.csv';DataFrameas
df9
variable
-- 9. How many unique types of transactions have been used?
SELECT COUNT(DISTINCT TRANSACTION_TYPE)
FROM 'cleaned_upi_transactions_2024.csv';DataFrameas
df15
variable
-- 10. Identify all the fraudulent transactions on weekends after midnight and before sunrise. Assume sunrise is at 6 a.m.
SELECT *
FROM 'cleaned_upi_transactions_2024.csv'
WHERE fraud_flag = '1'
AND is_weekend = '1'
AND hour_of_day BETWEEN '0' AND '5';