Skip to content
0

Modeling to help reduce employee turnover

1. Introduction

Turnover rate is: (No. of employees who left ÷ total no. of employees including the one who left) × 100.

Is the percentage of employees who left your business during a given period (often a calendar year).

Also called "staff turnover", is a key indicator in companies. Basic, the lower your turnover rate, the healthier your organization.

1.1. The Data

The department has assembled data on almost 10,000 employees. The team used information from exit interviews, performance reviews, and employee records.

  • "department" - the department the employee belongs to.
  • "promoted" - 1 if the employee was promoted in the previous 24 months, 0 otherwise.
  • "review" - the composite score the employee received in their last evaluation.
  • "projects" - how many projects the employee is involved in.
  • "salary" - for confidentiality reasons, salary comes in three tiers: low, medium, high.
  • "tenure" - how many years the employee has been at the company.
  • "satisfaction" - a measure of employee satisfaction from surveys.
  • "avg_hrs_month" - the average hours the employee worked in a month.
  • "left" - "yes" if the employee ended up leaving, "no" otherwise.

2. Load

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# load file
df = pd.read_csv('./data/employee_churn_data.csv')

# first 5 rows
df.head()
# last 5 rows
df.tail()

3. EDA

df.info()
# copy raw data
df_copy = df.copy()
# change dtypes
df_copy['promoted'] = df_copy['promoted'].astype('object')
df_copy['bonus'] = df_copy['promoted'].astype('object')
df_copy['salary'] = df_copy['salary'].astype('category')
# dtypes
print(df_copy.dtypes)

print('\n')

# number of rows and columns
print('Rows and columns: ' + str(df_copy.shape))

print('\n')

#check missing values
print(df_copy.isnull().sum())

print('\n')

# describe
print(df_copy.describe())

4. Target column

# target column
df_copy['left'].value_counts()
# target proportion
df_copy['left'].value_counts(normalize=True)