Employee Turnover -- Storytelling, Insights & Recommendations

from IPython import display
display.Image("./Employee-Turnover.jpg")

Can you help reduce employee turnover?

📖 Background

You work for the human capital department of a large corporation. The Board is worried about the relatively high turnover, and your team must look into ways to reduce the number of employees leaving the company.

The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, you can present your findings along with your ideas on how to attack the problem.

💾 The data

The department has assembled data on almost 10,000 employees. The team used information from exit interviews, performance reviews, and employee records.

"department" - the department the employee belongs to.
"promoted" - 1 if the employee was promoted in the previous 24 months, 0 otherwise.
"review" - the composite score the employee received in their last evaluation.
"projects" - how many projects the employee is involved in.
"salary" - for confidentiality reasons, salary comes in three tiers: low, medium, high.
"tenure" - how many years the employee has been at the company.
"satisfaction" - a measure of employee satisfaction from surveys.
"avg_hrs_month" - the average hours the employee worked in a month.
"left" - "yes" if the employee ended up leaving, "no" otherwise.

import pandas as pd
df = pd.read_csv('./data/employee_churn_data.csv')
df.head()

💪 Competition challenge

Create a report that covers the following:

Which department has the highest employee turnover? Which one has the lowest?
Investigate which variables seem to be better predictors of employee departure.
What recommendations would you make regarding ways to reduce employee turnover?

🧑‍⚖️ Judging criteria

CATEGORY	WEIGHTING	DETAILS
Recommendations	35%	Clarity of recommendations - how clear and well presented the recommendation is. Quality of recommendations - are appropriate analytical techniques used & are the conclusions valid? Number of relevant insights found for the target audience.
Storytelling	35%	How well the data and insights are connected to the recommendation. How the narrative and whole report connects together. Balancing making the report in-depth enough but also concise.
Visualizations	20%	Appropriateness of visualization used. Clarity of insight from visualization.
Votes	10%	Up voting - most upvoted entries get the most points.

✅ Checklist before publishing into the competition

Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
Remove redundant cells like the judging criteria, so the workbook is focused on your story.
Make sure the workbook reads well and explains how you found your insights.
Check that all the cells run without error.

⌛️ Time is ticking. Good luck!

0. Installing libraries and defining Auxillary Functions

import plotly.io as pio
import plotly.express as px

pio.templates.default = "plotly_white"

from IPython.core.display import display, HTML

def display_side_by_side(df, cols:list, tablespacing=5):
    """Display tables side by side to save vertical space
    Input:
        dfs: list of pandas.DataFrame
        captions: list of table captions
    """
    output = ""
    for col in cols:
        dfx = df.pivot_table(columns='left',index=col, values='employee_ID', aggfunc='nunique').fillna(0)
        output += dfx.div(dfx.sum(axis=1), axis=0).sort_values(by='yes', ascending=False).reset_index().style.background_gradient(cmap='Blues')\
    .format({k:'{:,.2%}'.format for k in df.left.unique()}).hide_index().set_table_attributes("style='display:inline'").set_caption(col)._repr_html_()
        output += tablespacing * "\xa0"
    display(HTML(output))
    
def plotter(df, cols):
    for col in cols:
        fig = px.histogram(df, x=col, color="left", marginal="box",
                       hover_data=df.columns, nbins=2000)
        fig.show()

1. Prepping

1.1. Separating categorical and numeric variables

To count employees with ease in analysis part

num_cols, cat_cols = df.columns[df.nunique()>=20].to_list(), df.columns[df.nunique()<20].drop('left').to_list()
print(f"{num_cols=}\n{cat_cols=}")

1.2. Generating Employee ID

df = df.reset_index().rename(columns={'index':'employee_ID'})
df.head()

2. Analysis

2.1. Which department has the highest employee turnover? Which one has the lowest?

‌
‌
‌