Skip to content
0

How Can The Company Improve Collaboration?

1. Introduction

1.1 Outline & Objective

We work in the analytics department of a multinational company, and the head of HR wants our help mapping out the company's employee network using message data.

They plan to use the network map to understand interdepartmental dynamics better and explore how the company shares information. The ultimate goal of this project is to think of ways to improve collaboration throughout the company. Given the dataset regarding Employee Network Analysis, the objective is to use the network map to better understand interdepartmental dynamics and explore how the company shares information.

1.2 The data

The company has six months of information on inter-employee communication. For privacy reasons, only sender, receiver, and message length information are available (source).

Messages has information on the sender, receiver, and time.
  • "sender" - represents the employee id of the employee sending the message.
  • "receiver" - represents the employee id of the employee receiving the message.
  • "timestamp" - the date of the message.
  • "message_length" - the length in words of the message.
Employees has information on each employee;
  • "id" - represents the employee id of the employee.
  • "department" - is the department within the company.
  • "location" - is the country where the employee lives.
  • "age" - is the age of the employee.

1.3 Data Structure

Before starting the analysis, it is important to understand how the data is structured.

There are 2 tables - Messages and Employees.
  • Each employee can have more than one message. The relationship between the Employees and Messages is one-to-many.

  • In the Employees table the "id" column acts as a primary key. The Employees table is linked to the Messages table via "sender" and "receiver" columns which are foreign keys in the Messages table.

Here is a graphical representation of the above:
ER Diagram

1.4 Methods

At the first stage, a preliminary data transformation was conducted including:

  • checking for missing values and duplicates in the Dataset
  • summary statistics (number of rows and columns, variable types, finding age range, dispersion of data, min and max values, etc.)
  • aggregating (such as grouping variables, joining tables and mutating variables)
  • Merging the datasets in one whole dataset for next step

2. Analysis

2.1 Which departments are the most/least active?

In order to find which departments are the most/least active, 2 tables are created: sender and receiver. These tables hold the sender and receiver id as well as their corresponding department. A small depiction is as follows:

Sender & Receiver

Supposing each row is a "message" or interaction between sender and receiver, to measure activity between departments we count all rows or "messages" that correspond to each department.

Bar Plots that showcase the total number of messages by departments.
  • The first Bar plot represents the total messages of all "senders" by department (sender - the id of the employee sending the message).
  • The second Bar plot represents the total messages of all "receivers" by department (receiver - the id of the employee receiving the message).
Total Number of Messages by Department

Conclusion:

  • The most active department is Sales with a total of 1551 messages sent and 1229 received
  • Respectively, the least active department is Marketing with a total of 16 messages sent and 140 received.

Prevously, we assumed that the highest number of messages reflect the most active department. The assumption would be accurate if the total number of employees in each department were equal. That is not the case.

A wiser option for determining activity is to add a ratio, as each department has a different number of Employee count. The ratio metric calculates the average messages per employee per department and is used to determine which department's employees send more messages on average. This metric is a good (if not better) representation of department activity.

Average Number of Employee Messages by Department

Conclusion: The Sales department the most active department:

  • each employee sends 30 messages on average
  • each employee receives 24 messages on average

The least active departments are Marketing, IT and Engineering:

  • each employee in these departments send 0 messages on average and receive about 2.
Average Employee Message Length by Department

Conclusion:

The lowest message count comes from the Engineering Department.

  • Shortest messages translate to lowest banter activity, making the Engineering Department, the least active.

2.2 Which employee has the most connections?

The plot below represents Employees with the highest number of unique connections. We emphasize on the word "unique" because logically, we want an employee that interacts with a high number of different people while working and not someone with recurring messages to specific people.

Employee With The Most Connections

Conclusion:

Employee with Id number "598" has the most unique connections (77). Although employee with Id number "144" does not fall short (75).

2.3 Identify the most influential employees and departments.

Employee

One way of finding the most influential employee would be with a Network diagram. We can use a network diagram as a graphical representation of how employees communicate with each other. An employee is a node/vertex and each edge of the network indicates a message/interaction.

There are noumerous ways of identifying which nodes/vertices may be the most important or influential. Perhaps the most straightforward measure of vertex importance is the degree of a vertex. The out-degree of a vertex is the number of other individuals to which a vertex has an outgoing edge directed to. The in-degree is the number of edges received from other individuals. Using both in/out degree methods we find that the 2 most influential employees are:

Most Influential Employees (based on in/out-Degree)

Betweeness

A slightly more interesting index of vertex importance is betweenness. This measures how frequently a vertex lies on the shortest paths between any two vertices in the network. It is equivalent to how critical each vertex is to the flow of information through a network. Individuals with high betweenness are key bridges between different parts of a network. Individuals with low betweenness are not that significant to the overall connectedness of the network.

Most Influential Employee (based on betweeness method)

Important Information:

Betweenness is useful for analyzing communication dynamics, but should be used with care. A high betweenness count could indicate someone holds authority over disparate clusters in a network, or just that they are on the periphery of both clusters. The fact that employee "509" is an administrator could indicate that betweeness method was not the perfect fit in our case.

Department

Regarding the most influential department, we operate with the same philosophy of Network Graphs, but with slight changes: Nodes represent departments, instead of employee ids.

We then create a weight column that will note the amount of messages sent between each set of nodes. In other words, the "weight" counts how many times a pair of nodes has occurred. Ultimately, The bigger the weight, the higher the number of connections between the pair (sender department and receiver department). We assume that a high number of connections between two departments also means a high number of interconnectivity, which translates to a bigger "influence".

We create a Network Graph that showcases the weight of each department or the "influence" of each department:

Network Graph

The network graph shows that lines/connections between Admin, Sales and Operations are much more dense compared to the other departments. The higher number of connections make each of these nodes critical as the flow of information directly derives from them.

Conclusion:

The most influential departments are:

  • Admin
  • Sales
  • Operations
The next graph represents an arc diagram.

Here, we layout the nodes (departments) in a horizontal line and have the edges (connections) drawn as arcs.

Unlike the Network Graph, this graph indicates directionality of the edges. The edges above the horizontal line move from left to right, while the edges below the line move from right to left.

Conclusion:

Sales, Operations and Admin departments communicate in a reciprocate way whereas Engineering, IT and Marketing do not - There is a denser line from the Sales department to Engineering, IT and Marketing compared to the other way around. This means that employees in the Sales department send more messages to the aforementioned departments.

Most of the information seems to derive from the Sales department.

2.4 Using the network analysis, in which departments would you recommend the HR team focus to boost collaboration?

Inspecting the Network graph, we notice a very weak connection between the departments: IT, Marketing and Engineering. This could be due to multiple reasons. It might be a demographic issue. For instance, there could be an age or experience gap between the employees, leading to poor communication. Perhaps location is a barrier to forming good connections.

To have a more detailed look at the employees within the departments, we create a:

  • Box Plot
  • Violin plot
  • Bar Plot

for the purpose of visualizing the distribution of employee ages among the departments.

Box Plot & Violin Plot

From the boxplot we can immediately see:

  • In all departments, the employee age varies from 40-45 on average

We use a violin plot in conjunction with the box plot because it shows nuances in the distribution that aren't perceptible in a box plot.

The violin plot shows:

  • A low count of young employees in the Marketing Department (based on how narrow the lower part of the pink area is)
Bar Plot

Finally, we use a bar plot for the purpose of visualizing the distribution of employee locations among the departments.

The bar plot shows that most employees are located in the US, while Brazil and UK are locations with the least amount of employees.

3. Executive Summary

4. Proposals For the HR Team

Reward Most Influential Employees and Departments

The most influential employees/departments are critical to the flow of information within the company, boosting its dynamics. A good practice would be to Reward them in a way you see fit. Perhaps a promotion, a bonus, an employee/department of the month badge or even a group trip. This could have a ripple effect, not only on the employee/department (by boosting their output even more), but also on the other not so infuential employees/departments by encouraging them to apply that pressure onto themselves. This in turn, will improve collaboration throughout the company.

Focus on Boosting Collaboration and Output of Marketing and Engineering Department

These two get consistently out-performed by the other departments in terms of communication. They are the ones with the least employee count, but that doesn't justify their lack of message activity. HR's main focus should be on:

  • Helping Marketing Department to be more enganged in collaboration with Sales, Operations and IT.
  • Boost Marketing department to send more messages to the IT department
  • Engineering departments messages are really low compared to the other departments. They should focus on collaborating more, on adding more information in their texts.

Hiring younger Employees in Marketing Department

Young people have skills, enthusiasm and innovative ideas to bring to the workplace, helping your business to stay fresh and up to date. With support, a young employee can help your workplace to flourish, providing new skills and building a workforce for the future. This is especially important for the Marketing department considering that the majority of users on social media are gen Z and millenials. Young minds are tech-savvy and know exaclty what is trendy in the influencer realm. Plus, young talents have different perspectives from growing up in a very different times. Hiring them is a good opportunity to revive the workplace and boost communication.

Employee bonding

Employee bonding is when coworkers connect, grow their relationships and become better collaborators in the workplace. Employee bonding strategies can lead to happier and more productive employees, which is important to creating a positive work culture and strong, effective teams. A couple examples are:

  • Team-building games
  • Game tournaments
  • Charity events
  • After-work meetups
  • Team meals
  • Bring-your-pet-to-work day

5. Appendix

5.1 Importing Libraries

# Import relevant libraries
library(tidyverse) # metapackage of all tidyverse packages
install.packages("ggraph") #plotting graphs
install.packages("tidygraph") 
library(igraph) #working withh graphs
library(ggraph) # to generate network graphs
library(tidygraph) # metapackage for network analysis
Hidden output

5.2 Exploratory Analysis

# Read files
employees <- readr::read_csv('data/employees.csv', show_col_types = FALSE)
messages <- readr::read_csv('data/messages.csv', show_col_types = FALSE)
# Taking a quick look
glimpse(employees)
glimpse(messages)
Hidden output
#number of missing value present in the dataset
sapply(employees, function(x) sum(is.na(x)))
sapply(employees, function(x) sum(is.na(x)))
Hidden output
# Check for duplicates
sum(duplicated(employees))
Hidden output
# Summary of employees table

summary(employees)
Hidden output
# count number of employees in each department
emp_count <- employees %>% group_by(department) %>% count()
emp_count
Hidden output

So far in the analysis, there are no duplicate/missing values. The data is clean. Based on the summary() fuction we see that ages in the workspace vary from 22 to 59.

5.3 Preparing for 1st Question


# Joining messages and eployees table on "sender" and "id"
sender <- messages %>% left_join(employees, 
        by=c('sender'='id'))

# Joining messages and eployees table on "receiver" and "id"
receiver <- messages %>% left_join(employees, 
        by=c('receiver'='id'))