Skip to content
0

Can you help reduce employee turnover?

๐Ÿ“– Background

You work for the human capital department of a large corporation. The Board is worried about the relatively high turnover, and your team must look into ways to reduce the number of employees leaving the company.

The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, you can present your findings along with your ideas on how to attack the problem.

๐Ÿ’พ The data

The department has assembled data on almost 10,000 employees. The team used information from exit interviews, performance reviews, and employee records.

  • "department" - the department the employee belongs to.
  • "promoted" - 1 if the employee was promoted in the previous 24 months, 0 otherwise.
  • "review" - the composite score the employee received in their last evaluation.
  • "projects" - how many projects the employee is involved in.
  • "salary" - for confidentiality reasons, salary comes in three tiers: low, medium, high.
  • "tenure" - how many years the employee has been at the company.
  • "satisfaction" - a measure of employee satisfaction from surveys.
  • "bonus" - 1 if the employee received a bonus in the previous 24 months, 0 otherwise.
  • "avg_hrs_month" - the average hours the employee worked in a month.
  • "left" - "yes" if the employee ended up leaving, "no" otherwise.
import pandas as pd
df = pd.read_csv('./data/employee_churn_data.csv')
df.head()
df.describe()
df.info()

The dataset has three object variables [department, salary, and left]. There are 3 integer variables [promoted, projects, bonus]. There are 4 float variables [review, tenure, satisfaction, and avg_hrs_month].

The object variables and integer variables are discrete variables. While the float variables are continuous variables.

# checking for missing values
df.isna().sum()

There are no missing values in the dataset

# check for duplicate variables
print(df.shape)
df = df.drop_duplicates()
df.shape

There are no duplicate values in the dataset.

Check the number of employees in each department

# checking for all the departments in the company
df['department'].unique()
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

dept_count = df['department'].value_counts()
print(dept_count)

# plot the number of employees in each department
fig = plt.figure(figsize=(10,6))
sns.barplot(dept_count.index, dept_count.values)
plt.title("Count of Employess in different departments")
plt.xlabel("Departments")
plt.ylabel("Count")
plt.xticks(rotation=90)
plt.show()

The sales department has the highest number of employees in the company. Most of the employees in the organization are from the Sales, Retail, Operations, and Engineering departments.

The IT, logistics, finance, and admin departments have the lowest number of employees.

# check the number of employees in the different departments that left the company
fig = plt.figure(figsize=(15,10))
plt.grid(True)
sns.countplot(x='department',
              data=df,
              hue='left',
              hue_order=["no", "yes"],
              palette=["green", "red"])
โ€Œ
โ€Œ
โ€Œ