Skip to content

Credit Card Fraud

This dataset consists of credit card transactions in the western United States. It includes information about each transaction including customer details, the merchant and category of purchase, and whether or not the transaction was a fraud.

Not sure where to begin? Scroll to the bottom to find challenges!

Source of dataset. The data was partially cleaned and adapted by DataCamp.

Credit Card Fraud Analysis Report

Motivation

A new credit card company operating in the western United States is marketed as a secure credit card. The company hired me as a data scientist to detect fraud on credit card transactions. This report describes how accurately we can predict whether each transaction in the data set is fraudulent.

Analysis Steps

After loading and analysing the dataset, we followed these steps:

Data Exploration:

We conducted an exploratory analysis to understand which product categories and transaction amounts are more prone to fraud.

Data Dictionary

transdatetrans_timeTransaction DateTime
merchantMerchant Name
categoryCategory of Merchant
amtAmount of Transaction
cityCity of Credit Card Holder
stateState of Credit Card Holder
latLatitude Location of Purchase
longLongitude Location of Purchase
city_popCredit Card Holder's City Population
jobJob of Credit Card Holder
dobDate of Birth of Credit Card Holder
trans_numTransaction Number
merch_latLatitude Location of Merchant
merch_longLongitude Location of Merchant
is_fraudWhether Transaction is Fraud (1) or Not (0)
import pandas as pd 

df = pd.read_csv('credit_card_fraud.csv') 
df.head(5)
df.info()
df.tail()
df_index = df.index
print(df_index)
df.describe() .T
df.isnull().values.any()
df.isnull().sum()

Product Categories and Transaction Amounts:

Certain product categories and transaction amounts appear to be more risky based on the fraud status in the data set.

# Analyse transaction amounts by fraud status
fraudulent_transactions = df[df['is_fraud'] == 1]
fraudulent_purchase_summary = fraudulent_transactions.groupby('category')['amt'].describe()
print(fraudulent_purchase_summary)

Geographical Analysis

We created a geographical map to visualise fraud rates in different states. Some states have higher fraud rates.

import pandas as pd
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv('credit_card_fraud.csv')

# Calculate fraud rates by state according to fraud status
fraudulent_state_counts = df[df['is_fraud'] == 1]['state'].value_counts()
total_state_counts = df['state'].value_counts()
fraud_rate_by_state = (fraudulent_state_counts / total_state_counts).fillna(0)

# Ranking states according to fraud rates
sorted_states = fraud_rate_by_state.sort_values(ascending=False)

# Visualise the states with the highest fraud rates
top_states = sorted_states.head(10)
top_states.plot(kind='bar', color='coral')
plt.title('Top 10 States with Fraud Rates')
plt.xlabel('State')
plt.ylabel('Fraud Rate')
plt.show()

Age Analysis:

We conducted an analysis to check whether older customers are more prone to credit card fraud and our analysis shows that age can be an indicator for credit card fraud. Older customers may be more at risk than younger customers.