Skip to content
0

Company's employee network analysis

Problem Statement

How can the company improve collaboration?

๐Ÿ“– Background

You work in the analytics department of a multinational company, and the head of HR wants your help mapping out the company's employee network using message data.

They plan to use the network map to understand interdepartmental dynamics better and explore how the company shares information. The ultimate goal of this project is to think of ways to improve collaboration throughout the company.

๐Ÿ’พ Lets understand our data

The company has six months of information on inter-employee communication. For privacy reasons, only sender, receiver, and message length information are available (source).

Messages has information on the sender, receiver, and time.
  • "sender" - represents the employee id of the employee sending the message.
  • "receiver" - represents the employee id of the employee receiving the message.
  • "timestamp" - the date of the message.
  • "message_length" - the length in words of the message.
Employees has information on each employee;
  • "id" - represents the employee id of the employee.
  • "department" - is the department within the company.
  • "location" - is the country where the employee lives.
  • "age" - is the age of the employee.

Acknowledgments: Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. "Patterns and dynamics of users' behavior and interaction: Network analysis of an online community." Journal of the American Society for Information Science and Technology 60.5 (2009): 911-932.

Lets Prepare our data

Importing data into pandas dataframes
import pandas as pd

messages = pd.read_csv('data/messages.csv', parse_dates= ['timestamp'])
messages
employees = pd.read_csv('data/employees.csv')
employees

Lets check for duplicates if any in both "messages" and "employees" dataframes

print(messages.duplicated().any())
print(employees.duplicated().any())

There are duplicated records in "messages" dataframe, lets drop duplicates using "drop_duplicates" by keeping first occuring records

messages.drop_duplicates(inplace=True)
print(messages.duplicated().any())

Lets check for any null values in both data frames

display(employees.isnull().any())
display(messages.isnull().any())

We are ready to process our data :)

Lets merge two dataframes to get sender and receiver details

โ€Œ
โ€Œ
โ€Œ