Credit Card Fraud

This dataset consists of credit card transactions in the western United States. It includes information about each transaction including customer details, the merchant and category of purchase, and whether or not the transaction was a fraud.

I will use this dataset to build a model to predict whether a credit card transaction was fraudulent or not.

Below is the dataset:

import pandas as pd 

df = pd.read_csv('credit_card_fraud.csv') 
df

Data Dictionary

transdatetrans_time	Transaction DateTime
merchant	Merchant Name
category	Category of Merchant
amt	Amount of Transaction
city	City of Credit Card Holder
state	State of Credit Card Holder
lat	Latitude Location of Purchase
long	Longitude Location of Purchase
city_pop	Credit Card Holder's City Population
job	Job of Credit Card Holder
dob	Date of Birth of Credit Card Holder
trans_num	Transaction Number
merch_lat	Latitude Location of Merchant
merch_long	Longitude Location of Merchant
is_fraud	Whether Transaction is Fraud (1) or Not (0)

Source of dataset. The data was partially cleaned and adapted by DataCamp.

Creating a DataFrame with only fraudulent transactions

f = df[df['is_fraud'] == 1]
f

print(df.shape)
print(f.shape)

Introduction

As the outcome is a classification problem with only two options ( fraud or not fraud) a classification models will be built and tested. In this instance I will use a Decision Tree Classifier.

Before building the predictive model, I will explore the data, looking for patterns, correlations and trends in the data. Visualizations will be used to better understand and visualize the makeup of the data.

The data in the dataset will also need to be prepared for the model. All categorical fields will need to be changed to numeric, some models will require scaling and any missing values will need to be dealt with.

After the exploration and preparation the model will be built and optimized using training data. Then, the model will be tested using never seen data.

After it is created, a visualization of the decision tree will be shown.

Importing Basic Libraries

import pandas as pd
import numpy as np

Data Exploration

In this section I will explore the data, looking at basic statistics, the size, shape and makeup of the dataset, distributions, and correlations. Some of these will be text-based explorations while others will be visuals.

The purposes of this section is to get a better feel and understanding of the data. To see if there are any immediately apparent patterns or trends, and to get a better sense if our final results align with our initial impressions.

Distributions

I will start by exploring and visualizing the distributions of the dataset

#Libraries
from matplotlib import pyplot as plt
from matplotlib.ticker import PercentFormatter
import folium

Distribution of Fraud

‌
‌
‌

Credit Card Fraud

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Credit Card Fraud

Data Dictionary

Creating a DataFrame with only fraudulent transactions

Introduction

Importing Basic Libraries

Data Exploration

Distributions

Distribution of Fraud

Credit Card Fraud