Skip to content

Workforce Analytics Initiative

1. Project Overview

This initiative explores workforce-related datasets from a simulated Canadian university. By examining data on human resources, employee performance, and absenteeism, the objective is to uncover insights related to workforce productivity, staffing trends, compensation structures, and retention patterns.

2. Key Business Questions

  1. Which departments exhibit the highest rates of absenteeism?
  2. How does employee performance correlate with tenure, age, and departmental affiliation?
  3. Is there a measurable relationship between high performance and higher compensation?

3. Set-up Environment

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

4. Data Loading

import pandas as pd

employees = pd.read_csv("employees.csv", parse_dates=["hire_date"])
departments = pd.read_csv("departments.csv")
absences = pd.read_csv("absences.csv")
performance_reviews = pd.read_csv("performance_reviews.csv", parse_dates=["review_date"])

5. Initial Exploration

We’ll explore the top rows of each dataset to understand their structure and contents.

# employee
employees.head()
# department
departments.head()
# asbences
absences.head()
# performance_reviews
performance_reviews.head()

6. Data Cleaning

We will check each dataset for null values, duplicates, and confirm that columns are using appropriate data types.

Employees

Check column types