Skip to content
'Unhappy hard-workers' and 'unhappy champions': how to make them happy?
  • AI Chat
  • Code
  • Report
  • Spinner

    'Unhappy hard-workers' and 'unhappy champions': how to make them happy?

    📖 Background

    You work for the human capital department of a large corporation. The Board is worried about the relatively high turnover, and your team must look into ways to reduce the number of employees leaving the company.

    The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, you can present your findings along with your ideas on how to attack the problem.

    💾 The data

    The department has assembled data on almost 10,000 employees. The team used information from exit interviews, performance reviews, and employee records.

    • "department" - the department the employee belongs to.
    • "promoted" - 1 if the employee was promoted in the previous 24 months, 0 otherwise.
    • "review" - the composite score the employee received in their last evaluation.
    • "projects" - how many projects the employee is involved in.
    • "salary" - for confidentiality reasons, salary comes in three tiers: low, medium, high.
    • "tenure" - how many years the employee has been at the company.
    • "satisfaction" - a measure of employee satisfaction from surveys.
    • "avg_hrs_month" - the average hours the employee worked in a month.
    • "left" - "yes" if the employee ended up leaving, "no" otherwise.
    import pandas as pd
    import numpy as np
    
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    from sklearn.model_selection import train_test_split
    from sklearn.model_selection import RandomizedSearchCV
    
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.ensemble import GradientBoostingClassifier
    from sklearn.cluster import KMeans
    
    from sklearn.metrics import f1_score
    from sklearn.metrics import roc_auc_score
    from sklearn.metrics import recall_score
    from sklearn.metrics import confusion_matrix
    
    from scipy.stats import boxcox
    from sklearn.preprocessing import MinMaxScaler
    
    from sklearn import tree 
    import graphviz
    from mpl_toolkits.mplot3d import Axes3D

    EDA

    Let' start with preprocessing and exploratory data analysis.

    df = pd.read_csv('./data/employee_churn_data.csv')
    df.head()
    df.info()

    Which department has the highest employee turnover? Which one has the lowest?

    df.department.value_counts()
    round(pd.crosstab(df.department, df.left, normalize = 0)*100, 1).sort_values(by = 'yes', ascending = False).reset_index()

    As we can see, there is the highest employee turnover (higher than 30%) in IT, logistics, retail and marketing departments, and the lowest (around 27%) is in the finance department.

    Taking into account the sizes of the departments, the turnover is especially alarming in IT and logistics.

    df['salary'] = df['salary'].replace({'low': 1, 'medium': 2, 'high': 3})
    df['left'] = df['left'].replace({'no': 0, 'yes': 1})
    sns.heatmap(df.corr());

    There is a strong linear correlation between the 'tenure' and 'avg_hrs_month' features, as well as a significant correlation of the 'review' feature with the target variable.

    'Review' may be an important predictor.

    sns.scatterplot(x='tenure', y='avg_hrs_month', hue='left', data=df);