AMLSim Machine Learning Project

IBM AMLSim

IBM AMLSim: The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the purpose of testing machine learning models and graph algorithms.
This dataset is an example dataset generated from IBM AMLSim.

Datasets:

There are 3 datasets mentioned here: alerts, transactions and accounts.

Accounts dataset: Contains the information about all the bank accounts whose transactions are monitored.
Alerts dataset: Contains the transactions which triggered an alert according to AML guidelines.
Transactions dataset: Contains the list of all the transactions with information about sender and receiver accounts.

Acknowledgements:

Do check out the AML Sim project and generate your own datasets for AML purposes.
Link: https://github.com/IBM/AMLSim

License:

IBM/AMLSim is licensed under the Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Link: https://github.com/IBM/AMLSim/blob/master/LICENSE

import pandas as pd
import numpy as np

def load_multiple_csv(df_list,path_list):
    for df in range(len(df_list)):
        df_list[df]  = pd.read_csv(path_list[df])
        print(df_list[df])
    
    
df_list = ["acc_df","alert_df","trans_df"]
path_list = ["accounts.csv","alerts.csv","transactions.csv"]

load_multiple_csv(df_list,path_list)

acc_df = pd.read_csv("accounts.csv")
alerts_df = pd.read_csv("alerts.csv")
trans_df = pd.read_csv("transactions.csv")

acc_df.sample(5).transpose()

acc_df.index

acc_df.info()

acc_df.describe()

def change_cols_dtype(df, to_int_col_list, to_str_col_list,to_category_col_list):
    for col in to_int_col_list:
        df[col] = df[col].astype(int)
    for col in to_str_col_list:
        df[col] = df[col].astype(str)
    for col in to_category_col_list:
        df[col] = df[col].astype('category')
    df.info()
        
to_int_col_list = ["INIT_BALANCE"]
to_str_col_list = ["ACCOUNT_ID"]
to_category_col_list  = ["COUNTRY","ACCOUNT_TYPE","TX_BEHAVIOR_ID","IS_FRAUD"]

change_cols_dtype(acc_df, to_int_col_list, to_str_col_list,to_category_col_list)

import matplotlib.pyplot as plt
import seaborn as sns

sns.countplot(data=acc_df,x='IS_FRAUD')
plt.show()

print("Number of Unique values:")
for col in acc_df.select_dtypes(['object','bool','category']):
    print(col,"-", acc_df[col].nunique())

trans_df["IS_FRAUD"].value_counts()

alerts_df.sample(5).transpose()

alerts_df.index

‌
‌
‌