Skip to content
Reducing employee churn
  • AI Chat
  • Code
  • Report
  • Can you help reduce employee turnover?

    📖 Background

    You work for the human capital department of a large corporation. The Board is worried about the relatively high turnover, and your team must look into ways to reduce the number of employees leaving the company.

    The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, you can present your findings along with your ideas on how to attack the problem.

    💾 The data

    The department has assembled data on almost 10,000 employees. The team used information from exit interviews, performance reviews, and employee records.

    • "department" - the department the employee belongs to.
    • "promoted" - 1 if the employee was promoted in the previous 24 months, 0 otherwise.
    • "review" - the composite score the employee received in their last evaluation.
    • "projects" - how many projects the employee is involved in.
    • "salary" - for confidentiality reasons, salary comes in three tiers: low, medium, high.
    • "tenure" - how many years the employee has been at the company.
    • "satisfaction" - a measure of employee satisfaction from surveys.
    • "bonus" - 1 if the employee received a bonus in the previous 24 months, 0 otherwise.
    • "avg_hrs_month" - the average hours the employee worked in a month.
    • "left" - "yes" if the employee ended up leaving, "no" otherwise.

    💪 Competition challenge

    Create a report that covers the following:

    1. Which department has the highest employee turnover? Which one has the lowest?
    2. Investigate which variables seem to be better predictors of employee departure.
    3. What recommendations would you make regarding ways to reduce employee turnover?

    Imports

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    import statsmodels.api as sm
    from statsmodels.stats.outliers_influence import variance_inflation_factor
    
    sns.set_style('whitegrid')
    def check_multicollinearity(X):
        # Check for multicollinearity using VIF
        vif_data = pd.DataFrame()
        vif_data["feature"] = X.columns
        vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(len(X.columns))]
        vif_data.sort_values(by = "VIF", ascending = False)
        return vif_data
    churn = pd.read_csv('./data/employee_churn_data.csv')
    churn.head()

    Which department has the highest employee turnover? Which one has the lowest?

    plt.figure(figsize = (16, 6))
    df = churn.copy()
    df = pd.crosstab(df['department'], df['left']).reset_index()
    df = df.melt(id_vars = ['department'])
    df.columns = ['department', 'left', 'count']
    df = df.sort_values(by = 'count', ascending = False)
    
    sns.barplot(data = df, y = 'department', x = 'count', hue = 'left')
    plt.xticks(rotation = 90)
    plt.tight_layout()
    plt.figure(figsize = (16, 6))
    
    df = churn.copy()
    df = pd.crosstab(df['department'], df['left'], normalize='index').sort_values(by = 'yes', ascending = False)['yes']
    df = df.reset_index()
    df.columns = ['department', 'pct']
    sns.barplot(data = df, y = 'department', x = 'pct')
    plt.title("Percentage of employee turn by department")
    plt.tight_layout()

    Investigate which variables seem to be better predictors of employee departure.

    Logistic Regression Model Building

    Initial Model

    mapping = {
        'yes': 1,
        'no': 0
    }
    df = churn.copy()
    df['left'] = df['left'].map(mapping)
    df = pd.get_dummies(df, drop_first=True)
    df