Skip to content
Project: Designing a Bank Marketing Database
  • AI Chat
  • Code
  • Report
    • _In this task, we are working with a dataset named "bank_marketing.csv," which contains information related to a bank's marketing campaigns.My goal is to perform several data preprocessing and transformation steps using Python and SQL, and then save the processed data into three separate CSV files: client.csv, campaign.csv, and economics.csv.
    • Additionally, we will create SQL table creation scripts for each of these CSV files.

    Step-by-Step Explanation:

    Python

    1. Reading the data from the "bank_marketing.csv" file into a Pandas DataFrame.
    2. Split the Data into Three DataFrames: client, campaign, and economics.
    3. Perform data cleaning tasks such as replacing "."; convert values in the columns to binary (1 or 0) based on conditions; change "nonexistent" to NumPy's null values in the column.
    4. Create Datetime Column, where the year is fixed as 2022, and the month and day values are taken from the "month" and "day" columns.
    5. Remove any redundant data or columns that might have been used to create new columns.
    6. Save Data to CSV Files:We save the three DataFrames (client, campaign, and economics) to CSV files without an index as client.csv, campaign.csv, and economics.csv, respectively. SQL
    7. Create SQL Table Scripts for each of the CSV files: client_table, campaign_table, and economics_table.
    8. Load Data into SQL Tables from the CSV files into their respective SQL tables using the \copy command in PostgreSQL.
    import pandas as pd
    import numpy as np
    
    #Reading the original dataset
    df = pd.read_csv('bank_marketing.csv')
    print(df.columns)
    #Split the original data into 3 DataFrames 
    client = df [['client_id', 'age', 'job', 'marital', 'education', 'credit_default',
                  'housing', 'loan']]
    campaign = df[['client_id','campaign','duration','pdays', 'previous','poutcome','y','day',                      'month']]
    economics = df[['client_id','emp_var_rate','cons_price_idx','euribor3m', 'nr_employed']]
    # Display the 3 new DataFrames
    print(client.head())
    print(campaign.head())
    print(economics.head())
    #Rename columns of client and economics
    client.rename(columns={"client_id": "id"}, inplace=True)
    
    economics.rename(columns={'euribor3m':'euribor_three_months', 'nr_employed':'number_employed'}, inplace=True)
    
    print(client.columns)
    print(economics.columns)
    #Rename columns of campaign dataframe
    
    campaign= campaign.rename(columns={'campaign': 'number_contacts',
                                        'duration': 'contact_duration',
                                        'previous': 'previous_campaign_contacts',
                                        'poutcome': 'previous_outcome',
                                        'y': 'campaign_outcome'})
    print(campaign.columns)
    #Cleaning client dataframe
    # Replace "." with "_" in the "education" column
    client['education'] = client['education'].str.replace('.', '_')
    
    # Replace "unknown" with null values in the "education" column
    client['education'] = client['education'].replace('unknown', np.NaN)
    
    print(client['education'])
    #There was value issue while I changed the binary value for the campaign_outcome column. Therefore,I delete the column and add it again from the original dataset (df).
    
    #Add the remove column to campaign dataframe
    campaign['campaign_outcome'] = df['y']
    campaign.head()
    #Cleaning campaign dataframe
    # Convert 'campaign_outcome' to binary values (1 or 0) using map()
    campaign['campaign_outcome'] = campaign['campaign_outcome'].map({'yes': 1, 'no': 0})
    
    # Print the updated 'campaign_outcome' column
    print(campaign['campaign_outcome'])
    
    # Replace 'nonexistent' with null values
    campaign['previous_outcome'] = campaign['previous_outcome'].replace('nonexistent', np.NaN)
    print(campaign['previous_outcome'].head())
    #Add a column called campaign_id in campaign dataframe, where all rows have a value of 1
    campaign['campaign_id'] = '1'
    #Check the campaign dataframe
    campaign.head()
    #Create 'datetime' column (last_contact_date) of campaign dataframe
    
    #Capitalize the values of motnh column
    campaign['month'] = campaign['month'].str.capitalize()
    
    #Add new column called 'year'
    campaign['year']= '2022'
    
    #Convert 'day' column to string
    campaign['day'] = campaign['day'].astype(str)
    
    #Add all this columns together
    campaign['date'] = campaign['year']+ '-'+ campaign['month']+ '-' + campaign['day']
    print(campaign['date'])
    
    #Convert to 'last_contact_date' column
    campaign['last_contact_date'] = pd.to_datetime(campaign['date'], format="%Y-%b-%d")
    print(campaign['last_contact_date'])
    
    #Columns to drop
    columns_to_drop = ['month', 'day', 'year']
    campaign.drop(columns=columns_to_drop, inplace=True)
    campaign.head()
    #Saving data
    client.to_csv('client.csv', index= False)
    economics.to_csv('economics.csv', index= False)
    campaign.to_csv('campaign.csv', index= False)