Skip to content
Project: Consolidating Employee Data
  • AI Chat
  • Code
  • Report
  • You just got hired as the first and only data practitioner at a small business experiencing exponential growth. The company needs more structured processes, guidelines, and standards. Your first mission is to structure the human resources data. The data is currently scattered across teams and files and comes in various formats: Excel files, CSVs, JSON files...

    You'll work with the following data in the datasets folder:

    • Office addresses are currently saved in office_addresses.csv. If the value for office is NaN, then the employee is remote.
    • Employee addresses are saved on the first tab of employee_information.xlsx.
    • Employee emergency contacts are saved on the second tab of employee_information.xlsx; this tab is called emergency_contacts. However, this sheet was edited at some point, and the headers were removed! The HR manager let you know that they should be: employee_id, last_name, first_name, emergency_contact, emergency_contact_number, and relationship.
    • Employee roles, teams, and salaries have been exported from the company's human resources management system into a JSON file titled employee_roles.json. Here are the first few lines of that file:
    {"A2R5H9": { "title": "CEO", "monthly_salary": "$4500", "team": "Leadership" }, ... }
    import pandas as pd
    # Start coding here... 
    #read and desplay Office adresses and its shape
    office_adress= pd.read_csv("datasets/office_addresses.csv")
    display (office_adress)
    print (office_adress.shape)
    # Read and display employee_information and its shape
    employee_info= pd.read_excel ('datasets/employee_information.xlsx', sheet_name=0)
    employee_info= employee_info.set_index('employee_id')
    display  (employee_info)
    print (employee_info.columns)
    print (employee_info.shape)
    # Renaming the columns
    values = 'employee_id, last_name, first_name, emergency_contact, emergency_contact_number, relationship'
    values_list = values.split(', ')
    
    # Read and display Employee emergency contacts and its shape
    emergency_contact = pd.read_excel('datasets/employee_information.xlsx', sheet_name=1, header=None, names=values_list)
    emergency_contact= emergency_contact.set_index("employee_id")
    display (emergency_contact)
    print (emergency_contact.columns)
    print (emergency_contact.shape)
    # Load, tanspose and display Employee roles, teams, and salaries and its shape
    import pandas as pd
    employee_role = pd.read_json('datasets/employee_roles.json').transpose()
    employee_role.index.name ='employee_id'
    display (employee_role)
    print (employee_role.shape)
    employee_role.columns
    # Droping mutual colmns from emergency_contac
    emergency_contact=emergency_contact.drop(columns=['last_name', 'first_name'])
    
    # Joining employee_info with emergency_contact
    merged_info_contacts = employee_info.merge(emergency_contact, left_index=True, right_index=True)
    
    # Rename the specific columns 
    merged_info_contacts.rename(columns={
        'employee_last_name': 'last_name',
        'employee_first_name': 'first_name'
    }, inplace=True)
    display (merged_info_contacts)
    print (merged_info_contacts.shape)
    # Merge with employee_role
    info_emerg_role= merged_info_contacts.merge(employee_role, left_index=True, right_index=True)
    display (info_emerg_role)
    print (info_emerg_role.columns)
    # Rest the index of info_emerg_role
    info_emerg_role= info_emerg_role.reset_index()
    display (info_emerg_role)
    print (info_emerg_role.columns)
    print (office_adress.columns)
    employees_final= info_emerg_role.merge(office_adress, how='left', left_on='employee_country', right_on="office_country")
    employees_final= employees_final.fillna("Remote")
    employees_final.set_index('employee_id', inplace=True)
    print (employees_final.columns)
    # Specify the desired order of columns
    column_order = ['first_name', 'last_name', 'employee_country', 'employee_city', 
                    'employee_street', 'employee_street_number', 'emergency_contact',
                    'emergency_contact_number', 'relationship', 'monthly_salary', 
                    'team', 'title', 'office', 'office_country', 'office_city', 
                    'office_street', 'office_street_number']
    
    # Reorder the columns of the DataFrame
    employees_final = employees_final[column_order]
    
    display (employees_final)