Exploring variables that reduce employee turnover

Can you help reduce employee turnover?

📖 Background

You work for the human capital department of a large corporation. The Board is worried about the relatively high turnover, and your team must look into ways to reduce the number of employees leaving the company.

The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, you can present your findings along with your ideas on how to attack the problem.

💾 The data

The department has assembled data on almost 10,000 employees. The team used information from exit interviews, performance reviews, and employee records.

"department" - the department the employee belongs to.
"promoted" - 1 if the employee was promoted in the previous 24 months, 0 otherwise.
"review" - the composite score the employee received in their last evaluation.
"projects" - how many projects the employee is involved in.
"salary" - for confidentiality reasons, salary comes in three tiers: low, medium, high.
"tenure" - how many years the employee has been at the company.
"satisfaction" - a measure of employee satisfaction from surveys.
"avg_hrs_month" - the average hours the employee worked in a month.
"left" - "yes" if the employee ended up leaving, "no" otherwise.

import pandas as pd
df = pd.read_csv('./data/employee_churn_data.csv')
df.head()

df.info()

# list the categories of department
df.department.unique()

# list the categories of salary
df.salary.unique()

# load packages
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# employee turnover (left)
turnover = round(df.left.value_counts(normalize=True)*100, 2)
turnover

sns.catplot(x='left', data=df, kind='count')
plt.ylabel('Number of employee turnover')
plt.title('Overall employee turnover')
plt.show()

Overall, 29.18% of the employees ended up leaving while 70.82% stayed.

1) Department with the highest/loweest employee turnover

# group the turnover rate by department
department_turnover = df.groupby('department')[['left']].count()
department_turnover['turnover_rate'] = round(department_turnover['left'] / 
                                             department_turnover['left'].sum() * 100, 2)
department_turnover = department_turnover.sort_values('left', ascending=False)
department_turnover

# Visualize the turnover rate per department
sns.catplot(y=department_turnover.index, x='turnover_rate', data=department_turnover, kind='bar')
plt.title('Employee turnover per department')
plt.show()

The sales department has the highest employee turnover (19.74%) and IT has the lowest (3.73%)

2) Investigation of variables that seem to be better predictors of employee departure or turnover

I will use a seaborn heatmap to check the correlation between the variables. The non-numeric variables will be converted to categorical variables (ordinal/norminal).

department is a norminal variable since it has 10 categories that do not have any intrinsic order
salary is an ordinal variable since it has 3 categories that can be ordered
left is also an ordinal variable since "no" is equivalent to "False" or 0 and "yes" to "True" or 1.

Thereafter, the categorical variables will be encoded with integer values which be used to determine their correlation with other variables in the seaborn heatmap.