Skip to content
0
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(class.output = "code-background")
.code-background {
  background-color: lightgreen;
  border: 3px solid brown;
  font-weight: bold;
}

install.packages("FactoMineR")
install.packages("factoextra")
library(tidyverse)
library(car)
library(scales)
library(psych)
library(FactoMineR)
library(factoextra)
theme_set(theme_bw())
theme_update(plot.title = element_text(hjust = 0.5, size = 20),
             plot.subtitle = element_text(hjust = 0.5, size = 15),
             axis.text = element_text(size = 18),
             axis.title = element_text(size = 18),
             legend.position = "bottom")
df <- readr::read_csv('./data/employee_churn_data.csv')

head(df)

1. Which department has the highest employee turnover? Which one has the lowest?

df %>%
  mutate(left = ifelse(left == "yes", 1, 0)) %>% 
  group_by(department) %>% 
  summarize(turnover_rate = mean(left)) %>% 
  arrange(desc(turnover_rate)) %>% 
  ggplot(aes(fct_reorder(department, turnover_rate, .desc = T), turnover_rate)) +
  geom_segment(aes(col = department, x = fct_reorder(department, turnover_rate, .desc = T), xend = fct_reorder(department, turnover_rate, .desc = T), y = 0, yend = turnover_rate), show.legend = F) +
  geom_point(show.legend = F, pch = 21, size = 10 ,aes(fill = department)) +
  geom_text(col = "black", aes(x = fct_reorder(department, turnover_rate, .desc = T), y = turnover_rate, label = 100 * round(turnover_rate, 3))) +
  scale_color_brewer(palette = "Set3") +
  scale_fill_brewer(palette = "Set3") +
  scale_y_continuous(labels = label_percent()) +
  labs(x = "Department", y = "Turnover",
       title = "Turnover percentage per deparment",
       subtitle = "n = 9540") +
  coord_flip() +
  theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5)) 
  • The IT department has the highest turnover percentage (30.9 %), followed by logistics (30.8 %), retail (30.6 %) and marketing (30.3 %)

  • Support (28.8 %), Engineering (28.8 %), Operations (28.6 %), Sales (28.5 %) and Administration (28.1 %) are in the middle field

  • The finance department has the lowest turnover percentage by quite the margin (26.9 %)

  • Looks like money - even if one is just working with it - does indeed buy happiness

2. Predictor variables for employee turnover

  • We can use an ordination method like the PCA to answer this question
  • PCA is made for numeric variables though, and we have a lot of categorical variables here
  • But if we dummify these categorical variables (but don't scale and center them), the PCA should work just fine

df_trans <- df %>% 
  select(review, projects, tenure, satisfaction, avg_hrs_month) %>% 
  map_df(~ .x - mean(.x)) %>% #centering on the numeric variables
  map_df(~ .x / sd(.x)) %>%  # scaling of the numeric variables
  cbind(
    df %>% 
  select(department, promoted, salary, bonus, left) %>% 
  map(~ psych::dummy.code(.x)) %>% 
  as.data.frame())

head(df_trans)

df_pca <- PCA(df_trans, graph = F, scale.unit = F,)
# scaling and centering on the continuous variables was already done earlier


plot.PCA(df_pca, choix = "var", alpha.var = "contrib") +
  theme_bw()
  • Arrows (variables) that point in the same direction are positively correlated with each other
  • Arrows (variable) that point in opposite directions are negatively correlated with each other
  • Arrows (variables) that are at a 90° angle are not correlated

3. Recommendations to reduce employee turnover

  • We can infer:
    • Being dissatisfied with the job increases the chances of quitting (not very surprising)
    • Having high review results increases the chances of quitting
    • The chances of an employee quitting seem not to be correlated with (but played a role in constructing the PCA planes:
      • The average working hours per month
      • The person's tenure in that company
    • The other variables don't seem to be having much effect on the likelihood of quitting
  • Recommendations:
    • It looks like as if people that get high scores on their reviews are more likely to quit, because:
      • A) They now know their worth
      • B) Its probably easier to get a new job with a strong review value
      • Therefore, we would recommend to reward people who get good reviews with some sort of compensation (e.g a raise)
    • Efforts to improve job satisfaction (New chairs, tables, maybe a ping-pong room, a PS5 in the break room, a movie room,...), prioritizing the departments with the highest turnover rates (IT, logistics, retail, marketing)
      • Or even stock shares in the company like the cool kids in Silicon Valley do