1st Time Payments Analysis
About 40% of our transactions are cashless - meaning we are charging clients and paying majority of the charge to the driver every week (withholding our 15-25% commissions).
We always pay the driver and take the risk for collecting the funds from the client.
Luckily, we have basic filters to block users from making multiple fraudulent transactions and find groups of fraudulent users using the same devices or cards. Now the next step is to stop people who are doing fraudulent transaction for the first time.
Outcome and task description Based on the sample data please come up with the top 2-5 developments that should be done to reduce the percent of failed payments (state “is_successful_payment” as 0).
Keeping in mind our developer team is small (3-4 people), describe why you picked exactly those.
If some parameters are missing from the list below, you can presume we collect all reasonable data that we possibly can, while using the platform as a rider or driver.
Sample data
Here is a dump of 1st time credit card orders (worldwide): ZIP, CSV.
Data includes some meta-data on users who make their 1st finished order with credit card as a payment method, as well as meta data on the transaction itself.
Field legend:
- created – time when the 1st time order request was created.
- device_name – name of the device used to make order
- device_os_version – version of the device OS
- country – 2 char country code
- city_id – internal system city ID (not relevant which one is which)
- lat – latitude of the pickup spot for the order
- lng – longitude of the pickup spot for the order
- real_destination_lat – latitude of the destination for the order
- real_destination_lng – longitude of the destination for the order
- user_id – internal user ID
- order_id – internal order ID
- order_try_id – internal order try ID (order tries happen before client and driver are matched to an order)
- distance – driver distance to the client pickup location, in meters
- ride_distance – trip distance in meters
- price – price charged to client, can be lower than “ride_price” if client had a discount, currencies vary and are undefined
- ride_price – calculated price of the final trip, currencies vary and are undefined
- price_review_status – “Price review” is when we send “ride_price” to be audited by human to check for system errors. 99% of orders are final and should have “ok” already set. There might be some that are still in pending states, most likely you can discard those.
- price_review_reason – automatic or manual reason for the price review to be requested.
- is_successful_payment – 1 means order was charged successfully, 0 mean it has failed (including after all attempts to re-charge)
- name – card details, irrelevant.
- card_bin – details on card BIN.
- failed_attempts – number of failed order attempts before this 1st finished order. __
SELECT * FROM '1st_adyen_rides-success-and-fail.csv'
limit 5;
Loading Libraries
## REQUIRED LIBRARIES
# For data wrangling
import numpy as np
import pandas as pd
# For visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
pd.options.display.max_rows = None
pd.options.display.max_columns = None
from pycaret.classification import *
##!pip install pandas scikit-learn xgboost
Read Dataset
# Read the data frame
df = pd.read_csv('1st_adyen_rides-success-and-fail.csv')
df.shape
There are 22 columns and 304K + raws in the data set
df.head()
DataFrame Information
df.info()
There are 7 categorical variables and 15 numerical variables
Summary Statistics
# Basic statistics
# Set display format to avoid scientific notation
pd.options.display.float_format = '{:.2f}'.format
# Display summary statistics again
summary_statistics = df.describe()
summary_statistics.T
#print(df.describe())
Missing Values Check