Duplicate of Employee Network Analysis

Improving Company's Collaboration

An analysis of a company's six months of information on inter-employee communication using python networkx. The dataset was assessed for errors and cleared of duplicates and merged into a single master dataset. The communication between six departments (Sales, Operations, IT, Admin, Marketing and Operations) and about 664 unique employee id's. The analysis involved creating a network visualization of the messages sent by each employee from a department to another employee in the same or different department. This was to observe the collaborations between employees/departments and also find out areas to improve collaboration within the department or with another department.

Extensive analysis was performed to determine the most active department when sending and receiving messages, in which the Sales department was the most active in both scenarios with an approximate average messages of 258 and 204 when sending and receiving messages for the six months period, respectively, while the Marketing department was the least active on both scenarios with approximate average messages of 3 and 23 when sending and receiving messages, respectively, for the period under study. The results also showed that the employee with id 598, aside being among the top five most influential employee (including id's 128, 605 and 586) also has the most connections. Whilst the Sales department is also the most influential department, more collaborative measures should be implemented by the HR to improve collaboration in the IT, Marketing and also the Engineering departments.

The visualization of the messages sent and received per department within the six months period under study showed a huge decline as the month progressed as shown in the trend plot. More messages were shared between departments in the 6th month than in other months, while the 11th month had the least messages.

Below is a description of the dataset used for this study.

Messages has information on the sender, receiver, and time.

"sender" - represents the employee id of the employee sending the message.
"receiver" - represents the employee id of the employee receiving the message.
"timestamp" - the date of the message.
"message_length" - the length in words of the message.

Employees has information on each employee;

"id" - represents the employee id of the employee.
"department" - is the department within the company.
"location" - is the country where the employee lives.
"age" - is the age of the employee.

# Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import networkx as nx
import warnings

%matplotlib inline
sb.set_theme(style = 'whitegrid')
warnings.filterwarnings('ignore')

# Loading datasets
messages = pd.read_csv('data/messages.csv', parse_dates = ['timestamp'])
employees = pd.read_csv('data/employees.csv')

Data Assessing

The data (messages and employees) will be assessed to ascertain if it is clean or not.

First, a copy of the original data will be made for this process and further down. This is to have easy access to the original data when the need arises.

# Copying data
messages_sent = messages.copy()
employees_data = employees.copy()

Checking for missing values

print(messages_sent.info())
print(employees_data.info())

The messages_sent and employees_data have no missing values as observed from the results obtained above. The columns also have appropriate data types assigned to them. The timestamp has datetime64[ns] as its data type, sender and receiver both integers, and same goes with other features in the data set.

One issue arises, though trivial but important, sender and receiver in the messages_sent data both indicate the sender and receiver of the message(s) respectively, but are actually the id of both, thus the columns will be renamed to sender_id and receiver_id respectively. This is to properly communicate what the feature (column) actually contains.

# Renaming columns
messages_sent.rename(columns = {'sender':'sender_id', 'receiver':'receiver_id'}, inplace = True)
messages_sent.head(2)

employees_data.head(2)

Checking for duplicates

messages_sent.duplicated().sum()

employees_data.duplicated().sum()

Exploring further on duplicated values

# Subset duplicated values
duplicate_values = messages_sent.duplicated(keep = False)
messages_sent[duplicate_values]

‌
‌
‌