Credit Card Fraud
This dataset consists of credit card transactions in the western United States. It includes information about each transaction including customer details, the merchant and category of purchase, and whether or not the transaction was a fraud.
Not sure where to begin? Scroll to the bottom to find challenges!
Source of dataset. The data was partially cleaned and adapted by DataCamp.
Credit Card Fraud Analysis Report
Motivation
A new credit card company operating in the western United States is marketed as a secure credit card. The company hired me as a data scientist to detect fraud on credit card transactions. This report describes how accurately we can predict whether each transaction in the data set is fraudulent.
Analysis Steps
After loading and analysing the dataset, we followed these steps:
Data Exploration:
We conducted an exploratory analysis to understand which product categories and transaction amounts are more prone to fraud.
Data Dictionary
transdatetrans_time | Transaction DateTime |
---|---|
merchant | Merchant Name |
category | Category of Merchant |
amt | Amount of Transaction |
city | City of Credit Card Holder |
state | State of Credit Card Holder |
lat | Latitude Location of Purchase |
long | Longitude Location of Purchase |
city_pop | Credit Card Holder's City Population |
job | Job of Credit Card Holder |
dob | Date of Birth of Credit Card Holder |
trans_num | Transaction Number |
merch_lat | Latitude Location of Merchant |
merch_long | Longitude Location of Merchant |
is_fraud | Whether Transaction is Fraud (1) or Not (0) |
import pandas as pd
df = pd.read_csv('credit_card_fraud.csv')
df.head(5)
df.info()
df.tail()
df_index = df.index
print(df_index)
df.describe() .T
df.isnull().values.any()
df.isnull().sum()
Product Categories and Transaction Amounts:
Certain product categories and transaction amounts appear to be more risky based on the fraud status in the data set.
# Analyse transaction amounts by fraud status
fraudulent_transactions = df[df['is_fraud'] == 1]
fraudulent_purchase_summary = fraudulent_transactions.groupby('category')['amt'].describe()
print(fraudulent_purchase_summary)
Geographical Analysis
We created a geographical map to visualise fraud rates in different states. Some states have higher fraud rates.
import pandas as pd
import matplotlib.pyplot as plt
# Load data
df = pd.read_csv('credit_card_fraud.csv')
# Calculate fraud rates by state according to fraud status
fraudulent_state_counts = df[df['is_fraud'] == 1]['state'].value_counts()
total_state_counts = df['state'].value_counts()
fraud_rate_by_state = (fraudulent_state_counts / total_state_counts).fillna(0)
# Ranking states according to fraud rates
sorted_states = fraud_rate_by_state.sort_values(ascending=False)
# Visualise the states with the highest fraud rates
top_states = sorted_states.head(10)
top_states.plot(kind='bar', color='coral')
plt.title('Top 10 States with Fraud Rates')
plt.xlabel('State')
plt.ylabel('Fraud Rate')
plt.show()
Age Analysis:
We conducted an analysis to check whether older customers are more prone to credit card fraud and our analysis shows that age can be an indicator for credit card fraud. Older customers may be more at risk than younger customers.