Skip to content
1st Time Payments Analysis
  • AI Chat
  • Code
  • Report
  • 1st Time Payments Analysis

    About 40% of our transactions are cashless - meaning we are charging clients and paying majority of the charge to the driver every week (withholding our 15-25% commissions).

    We always pay the driver and take the risk for collecting the funds from the client.

    Luckily, we have basic filters to block users from making multiple fraudulent transactions and find groups of fraudulent users using the same devices or cards. Now the next step is to stop people who are doing fraudulent transaction for the first time.

    Outcome and task description Based on the sample data please come up with the top 2-5 developments that should be done to reduce the percent of failed payments (state “is_successful_payment” as 0).

    Keeping in mind our developer team is small (3-4 people), describe why you picked exactly those.

    If some parameters are missing from the list below, you can presume we collect all reasonable data that we possibly can, while using the platform as a rider or driver.

    Sample data

    Here is a dump of 1st time credit card orders (worldwide): ZIP, CSV.

    Data includes some meta-data on users who make their 1st finished order with credit card as a payment method, as well as meta data on the transaction itself.

    Field legend:

    • created – time when the 1st time order request was created.
    • device_name – name of the device used to make order
    • device_os_version – version of the device OS
    • country – 2 char country code
    • city_id – internal system city ID (not relevant which one is which)
    • lat – latitude of the pickup spot for the order
    • lng – longitude of the pickup spot for the order
    • real_destination_lat – latitude of the destination for the order
    • real_destination_lng – longitude of the destination for the order
    • user_id – internal user ID
    • order_id – internal order ID
    • order_try_id – internal order try ID (order tries happen before client and driver are matched to an order)
    • distance – driver distance to the client pickup location, in meters
    • ride_distance – trip distance in meters
    • price – price charged to client, can be lower than “ride_price” if client had a discount, currencies vary and are undefined
    • ride_price – calculated price of the final trip, currencies vary and are undefined
    • price_review_status – “Price review” is when we send “ride_price” to be audited by human to check for system errors. 99% of orders are final and should have “ok” already set. There might be some that are still in pending states, most likely you can discard those.
    • price_review_reason – automatic or manual reason for the price review to be requested.
    • is_successful_payment – 1 means order was charged successfully, 0 mean it has failed (including after all attempts to re-charge)
    • name – card details, irrelevant.
    • card_bin – details on card BIN.
    • failed_attempts – number of failed order attempts before this 1st finished order. __
    Spinner
    DataFrameavailable as
    df
    variable
    SELECT * FROM '1st_adyen_rides-success-and-fail.csv'
    limit 5;

    Loading Libraries

    ## REQUIRED LIBRARIES
    # For data wrangling 
    import numpy as np
    import pandas as pd
    
    # For visualization
    import matplotlib.pyplot as plt
    %matplotlib inline
    import seaborn as sns
    pd.options.display.max_rows = None
    pd.options.display.max_columns = None
    
    
    from pycaret.classification import *
    
    ##!pip install pandas scikit-learn xgboost
    

    Read Dataset

    # Read the data frame
    df = pd.read_csv('1st_adyen_rides-success-and-fail.csv')
    
    df.shape

    There are 22 columns and 304K + raws in the data set

    df.head()

    DataFrame Information

    df.info()

    There are 7 categorical variables and 15 numerical variables

    Summary Statistics

    # Basic statistics
    # Set display format to avoid scientific notation
    pd.options.display.float_format = '{:.2f}'.format
    
    # Display summary statistics again
    summary_statistics = df.describe()
    summary_statistics.T
    #print(df.describe())

    Missing Values Check