Skip to content
Credit Card Fraud
  • AI Chat
  • Code
  • Report
  • Credit Card Fraud

    This dataset consists of credit card transactions in the western United States. It includes information about each transaction including customer details, the merchant and category of purchase, and whether or not the transaction was a fraud.

    Not sure where to begin? Scroll to the bottom to find challenges!

    Run cancelled
    import pandas as pd 
    ccf = pd.read_csv('credit_card_fraud.csv') 
    ccf.head(5)

    Data Dictionary

    transdatetrans_timeTransaction DateTime
    merchantMerchant Name
    categoryCategory of Merchant
    amtAmount of Transaction
    cityCity of Credit Card Holder
    stateState of Credit Card Holder
    latLatitude Location of Purchase
    longLongitude Location of Purchase
    city_popCredit Card Holder's City Population
    jobJob of Credit Card Holder
    dobDate of Birth of Credit Card Holder
    trans_numTransaction Number
    merch_latLatitude Location of Merchant
    merch_longLongitude Location of Merchant
    is_fraudWhether Transaction is Fraud (1) or Not (0)

    Source of dataset. The data was partially cleaned and adapted by DataCamp.

    Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

    A new credit card company has just entered the market in the western United States. The company is promoting itself as one of the safest credit cards to use. They have hired you as their data scientist in charge of identifying instances of fraud. The executive who hired you has have provided you with data on credit card transactions, including whether or not each transaction was fraudulent.

    The executive wants to know how accurately you can predict fraud using this data. She has stressed that the model should err on the side of caution: it is not a big problem to flag transactions as fraudulent when they aren't just to be safe. In your report, you will need to describe how well your model functions and how it adheres to these criteria.

    You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.

    Check the Data Set

    import the packages required

    Run cancelled
    # For data wrangling 
    import numpy as np
    import pandas as pd
    
    # For visualization
    import matplotlib.pyplot as plt
    %matplotlib inline
    import seaborn as sns
    pd.options.display.max_rows = None
    pd.options.display.max_columns = None
    
    
    
    import plotly.graph_objects as go

    check what sort of data and any null values by surface

    Run cancelled
    ccf.info()

    There are no null values present in the dataset. There are 9 numerical values and 8 categorical value fields in the data set

    Run cancelled
    # Basic statistics
    # Set display format to avoid scientific notation
    pd.options.display.float_format = '{:.2f}'.format
    
    # summary statistics 
    summary_statistics = ccf.describe()
    summary_statistics.T
    #print(df.describe())

    By using we can identify the ranges where our data disperse

    Desecting date time column in to different fields

    Run cancelled
    # Convert 'trans_date_trans_time' to datetime format
    ccf['trans_date_trans_time'] = pd.to_datetime(ccf['trans_date_trans_time'])
    
    # Create separate columns for date and time
    ccf['trans_date'] = ccf['trans_date_trans_time'].dt.date
    ccf['trans_time'] = ccf['trans_date_trans_time'].dt.time
    
    # Extract year, month, day, hour, minute, and second into separate columns
    ccf['year'] = ccf['trans_date_trans_time'].dt.year
    ccf['month'] = ccf['trans_date_trans_time'].dt.month
    ccf['day'] = ccf['trans_date_trans_time'].dt.day
    ccf['hour'] = ccf['trans_date_trans_time'].dt.hour
    ccf['minute'] = ccf['trans_date_trans_time'].dt.minute
    ccf['second'] = ccf['trans_date_trans_time'].dt.second
    
    # Convert 'trans_date' to string format YYYY-MM-DD
    ccf['trans_date'] = ccf['trans_date'].astype(str)