Credit Card Fraud

This dataset consists of credit card transactions in the western United States. It includes information about each transaction including customer details, the merchant and category of purchase, and whether or not the transaction was a fraud.

Not sure where to begin? Scroll to the bottom to find challenges!

Source of dataset. The data was partially cleaned and adapted by DataCamp.

Credit Card Fraud Analysis Report

Motivation

A new credit card company operating in the western United States is marketed as a secure credit card. The company hired me as a data scientist to detect fraud on credit card transactions. This report describes how accurately we can predict whether each transaction in the data set is fraudulent.

Analysis Steps

After loading and analysing the dataset, we followed these steps:

Data Exploration:

We conducted an exploratory analysis to understand which product categories and transaction amounts are more prone to fraud.

Data Dictionary

transdatetrans_time	Transaction DateTime
merchant	Merchant Name
category	Category of Merchant
amt	Amount of Transaction
city	City of Credit Card Holder
state	State of Credit Card Holder
lat	Latitude Location of Purchase
long	Longitude Location of Purchase
city_pop	Credit Card Holder's City Population
job	Job of Credit Card Holder
dob	Date of Birth of Credit Card Holder
trans_num	Transaction Number
merch_lat	Latitude Location of Merchant
merch_long	Longitude Location of Merchant
is_fraud	Whether Transaction is Fraud (1) or Not (0)

import pandas as pd 

df = pd.read_csv('credit_card_fraud.csv')

df.head(5)
df.info()
df.tail()

df_index = df.index
print(df_index)

df.describe() .T
df.isnull().values.any()
df.isnull().sum()

Product Categories and Transaction Amounts:

Certain product categories and transaction amounts appear to be more risky based on the fraud status in the data set.

# Analyse transaction amounts by fraud status
fraudulent_transactions = df[df['is_fraud'] == 1]
fraudulent_purchase_summary = fraudulent_transactions.groupby('category')['amt'].describe()
print(fraudulent_purchase_summary)

Geographical Analysis

We created a geographical map to visualise fraud rates in different states. Some states have higher fraud rates.

import pandas as pd
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv('credit_card_fraud.csv')

# Calculate fraud rates by state according to fraud status
fraudulent_state_counts = df[df['is_fraud'] == 1]['state'].value_counts()
total_state_counts = df['state'].value_counts()
fraud_rate_by_state = (fraudulent_state_counts / total_state_counts).fillna(0)

# Ranking states according to fraud rates
sorted_states = fraud_rate_by_state.sort_values(ascending=False)

# Visualise the states with the highest fraud rates
top_states = sorted_states.head(10)
top_states.plot(kind='bar', color='coral')
plt.title('Top 10 States with Fraud Rates')
plt.xlabel('State')
plt.ylabel('Fraud Rate')
plt.show()

Age Analysis:

We conducted an analysis to check whether older customers are more prone to credit card fraud and our analysis shows that age can be an indicator for credit card fraud. Older customers may be more at risk than younger customers.

‌
‌
‌

Credit Card Fraud

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Credit Card Fraud