Credit Card Fraud

This dataset consists of credit card transactions in the western United States. It includes information about each transaction including customer details, the merchant and category of purchase, and whether or not the transaction was a fraud.

In this project we will perform exploratory data analysis and data vizualization to better understand the data and we will create a model to predict instances of credit card fraud.

Source of dataset. The data was partially cleaned and adapted by DataCamp.

Data Dictionary

transdatetrans_time	Transaction DateTime
merchant	Merchant Name
category	Category of Merchant
amt	Amount of Transaction
city	City of Credit Card Holder
state	State of Credit Card Holder
lat	Latitude Location of Purchase
long	Longitude Location of Purchase
city_pop	Credit Card Holder's City Population
job	Job of Credit Card Holder
dob	Date of Birth of Credit Card Holder
trans_num	Transaction Number
merch_lat	Latitude Location of Merchant
merch_long	Longitude Location of Merchant
is_fraud	Whether Transaction is Fraud (1) or Not (0)

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('darkgrid')

ccf = pd.read_csv('credit_card_fraud.csv') 
ccf.head(5)

Clean Data

In this section we will check for missing values and duplicates and deal with them accordingly.

print(ccf.isna().sum())
print('Number of duplicates: ' + str(ccf.duplicated().sum()))

Our data has no missing values and no duplicates.

Add age and hour columns

It is possible that credit card fraud could correlate with the age of the card holder and with the hour of the day of the transaction, so we will add columns for these feature.

ccf['trans_date_trans_time'] = pd.to_datetime(ccf['trans_date_trans_time'])
ccf['dob'] = pd.to_datetime(ccf['dob'])
ccf['age_at_trans'] = ((ccf['trans_date_trans_time'] - ccf['dob']).dt.days / 365.25).astype(int)
ccf['hour_of_trans'] = ccf['trans_date_trans_time'].dt.hour

print(ccf['trans_date_trans_time'].dtype)
print(ccf['dob'].dtype)
print(ccf['age_at_trans'].head())
print(ccf['hour_of_trans'].unique())

The output shows the dates were converted to the correct type and the age and hour columns were successfully created.

Exploring the data

First we filter the data to contain only the fraudulent transactions

frauds = ccf[ccf['is_fraud'] == 1]
frauds_by_cat = frauds['category'].value_counts()

Fraud by merchant category:

Fraud may occur at different rates between merchant types. We can plot the number of instances of fraud by merchant type.

sns.barplot(frauds_by_cat)
plt.title('Number of Frauds vs. Merchant Category')
plt.xticks(rotation=90)
plt.xlabel('Merchant Category')
plt.ylabel('Number of Credit Card Frauds')
plt.show()

‌
‌
‌

Credit Card Fraud

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Credit Card Fraud

Data Dictionary

Clean Data

Add age and hour columns

Exploring the data

Fraud by merchant category:

Credit Card Fraud