Skip to content
Duplicate of Employee Network Analysis
  • AI Chat
  • Code
  • Report
  • Improving Company's Collaboration

    An analysis of a company's six months of information on inter-employee communication using python networkx. The dataset was assessed for errors and cleared of duplicates and merged into a single master dataset. The communication between six departments (Sales, Operations, IT, Admin, Marketing and Operations) and about 664 unique employee id's. The analysis involved creating a network visualization of the messages sent by each employee from a department to another employee in the same or different department. This was to observe the collaborations between employees/departments and also find out areas to improve collaboration within the department or with another department.

    Extensive analysis was performed to determine the most active department when sending and receiving messages, in which the Sales department was the most active in both scenarios with an approximate average messages of 258 and 204 when sending and receiving messages for the six months period, respectively, while the Marketing department was the least active on both scenarios with approximate average messages of 3 and 23 when sending and receiving messages, respectively, for the period under study. The results also showed that the employee with id 598, aside being among the top five most influential employee (including id's 128, 605 and 586) also has the most connections. Whilst the Sales department is also the most influential department, more collaborative measures should be implemented by the HR to improve collaboration in the IT, Marketing and also the Engineering departments.

    The visualization of the messages sent and received per department within the six months period under study showed a huge decline as the month progressed as shown in the trend plot. More messages were shared between departments in the 6th month than in other months, while the 11th month had the least messages.

    Below is a description of the dataset used for this study.

    Messages has information on the sender, receiver, and time.
    • "sender" - represents the employee id of the employee sending the message.
    • "receiver" - represents the employee id of the employee receiving the message.
    • "timestamp" - the date of the message.
    • "message_length" - the length in words of the message.
    Employees has information on each employee;
    • "id" - represents the employee id of the employee.
    • "department" - is the department within the company.
    • "location" - is the country where the employee lives.
    • "age" - is the age of the employee.
    # Import modules
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sb
    import networkx as nx
    import warnings
    
    %matplotlib inline
    sb.set_theme(style = 'whitegrid')
    warnings.filterwarnings('ignore')
    # Loading datasets
    messages = pd.read_csv('data/messages.csv', parse_dates = ['timestamp'])
    employees = pd.read_csv('data/employees.csv')

    Data Assessing

    The data (messages and employees) will be assessed to ascertain if it is clean or not.

    First, a copy of the original data will be made for this process and further down. This is to have easy access to the original data when the need arises.

    # Copying data
    messages_sent = messages.copy()
    employees_data = employees.copy()
    Checking for missing values
    print(messages_sent.info())
    print(employees_data.info())

    The messages_sent and employees_data have no missing values as observed from the results obtained above. The columns also have appropriate data types assigned to them. The timestamp has datetime64[ns] as its data type, sender and receiver both integers, and same goes with other features in the data set.

    One issue arises, though trivial but important, sender and receiver in the messages_sent data both indicate the sender and receiver of the message(s) respectively, but are actually the id of both, thus the columns will be renamed to sender_id and receiver_id respectively. This is to properly communicate what the feature (column) actually contains.

    # Renaming columns
    messages_sent.rename(columns = {'sender':'sender_id', 'receiver':'receiver_id'}, inplace = True)
    messages_sent.head(2)
    employees_data.head(2)
    Checking for duplicates
    messages_sent.duplicated().sum()
    employees_data.duplicated().sum()

    Exploring further on duplicated values

    # Subset duplicated values
    duplicate_values = messages_sent.duplicated(keep = False)
    messages_sent[duplicate_values]