Skip to content

Employee Database

Spinner
DataFrameas
df
variable
SELECT *
FROM salaries
LIMIT 5;

1. Average Employee Tenure by Position

This analysis calculates and visualizes the average employee tenure based on the last held position within the organization.

By identifying the average years of service for different roles, this visualization helps:

  • Assess employee retention across various positions.
  • Highlight which roles experience longer tenure (e.g., Senior Staff and Senior Engineer).
  • Identify roles with shorter retention periods to explore opportunities for improvement.

This information is valuable for understanding retention trends and creating strategies to improve employee longevity across all levels.

Spinner
DataFrameas
employee_tenures
variable
WITH past_employees AS (
	SELECT 
    	emp_no,
        Min(from_date) AS start_date,
        Max(to_date) AS end_date
    FROM salaries
    WHERE to_date != (SELECT MAX(to_date) FROM salaries)
    GROUP BY emp_no),
    
employee_tenures AS (
	SELECT 
		emp_no,
        start_date,
        end_date,
        DATEDIFF(end_date, start_date) / 365 AS employee_tenure
    FROM past_employees)

SELECT 
	title,
    AVG(employee_tenure) AS avg_employee_tenure
FROM employee_tenures
INNER JOIN titles
    ON employee_tenures.emp_no = titles.emp_no
    AND employee_tenures.end_date = titles.to_date
GROUP BY title
ORDER BY avg_employee_tenure
import pandas as pd
import plotly.express as px

fig = px.bar(
    employee_tenures,
    x="avg_employee_tenure",
    y="title",
    text=employee_tenures["avg_employee_tenure"].apply(lambda x: f"{x:.2f}")
)
x_max = employee_tenures["avg_employee_tenure"].max() + 2

fig.update_layout(
    title="Average Employee Tenure<br><sup>By Last Held Position</sup>",
    title_x=0.5,
    xaxis_title="Average Tenure",
    yaxis_title="Position",
    legend_title="Position",
    template="plotly_dark",
    xaxis = dict(range=[0, x_max]),
)

fig.update_traces(
    textposition="outside" 
)

fig.show()

Conclusion

The Average Employee Tenure graph reveals variations in tenure based on the last held position:

  • Senior Staff: 7.45 years (longest average tenure).
  • Senior Engineer: 7.34 years, indicating stability at senior levels.
  • Technique Leader: 4.84 years, moderate tenure compared to senior roles.
  • Engineer: 3.26 years, shorter tenure among mid-level positions.
  • Staff: 2.98 years and Assistant Engineer: 2.93 years represent the shortest tenures.

This distribution highlights a trend where higher-level roles have longer tenures, while lower and mid-tier positions experience shorter employee retention.

2. Average Salary by Employee (Top 10)

This analysis calculates and visualizes the average salary for the top 10 employees. Salaries are averaged across all time periods for each employee, and the results are displayed in descending order to identify employees with the highest average salaries.

Spinner
DataFrameas
df_salary
variable
SELECT emp_no, 
       AVG(salary) AS avg_salary
FROM salaries
GROUP BY emp_no
ORDER BY avg_salary DESC
LIMIT 10;
import pandas as pd
import plotly.express as px

df_salary["emp_no"] = df_salary["emp_no"].astype(str)

df_salary["avg_salary_k"] = df_salary["avg_salary"]

y_max = df_salary["avg_salary_k"].max() + 20000

fig = px.bar(
    df_salary, 
    x="emp_no", 
    y="avg_salary", 
    title="<b>Average Salary by Employee</b>",
    text=df_salary["avg_salary_k"].apply(lambda x: f"{x/1000:.2f}k"),
    labels={"emp_no": "Employee ID", "avg_salary_k": "Average Salary (k USD)"},
    template="plotly_dark"
)

fig.update_traces(
    textposition="outside"
)

fig.update_layout(
    title_x=0.5,
    font=dict(family="Arial, sans-serif", size=14, color="white"),
    xaxis=dict(title="Employee ID", tickangle=-45),
    yaxis=dict(
        title="Average Salary (k USD)", 
        tickformat="~s",
        range=[0, y_max]
    ),
    margin=dict(l=40, r=40, t=80, b=80)
)

fig.show()

Conclusion

  • The salary difference between the top employee and the 10th employee is only 6.8k USD, indicating relatively minor variations.
  • This highlights a narrow range among the top earners, suggesting consistency in compensation for the highest-paid employees.

The graph provides insight into the top salary distribution, showcasing a uniform compensation strategy among the top earners.

3. Salary Distribution Analysis

This analysis visualizes the distribution of employee salaries. Salaries are grouped into specific ranges. The goal is to understand the spread and concentration of salaries across the organization.

Spinner
DataFrameas
df_salary_time
variable
SELECT emp_no, 
       AVG(salary) AS avg_salary
FROM salaries
GROUP BY emp_no
ORDER BY avg_salary DESC;
import pandas as pd
import numpy as np
import plotly.express as px

df_salary_time["avg_salary_k"] = df_salary_time["avg_salary"] / 1000

max_salary = df_salary_time["avg_salary_k"].max()
bin_size = max_salary / 3  
bin_edges = [i * bin_size for i in range(4)]

bin_counts, _ = np.histogram(df_salary_time["avg_salary_k"], bins=bin_edges)
total_count = sum(bin_counts)
percentages = [count / total_count * 100 for count in bin_counts]

bin_labels = [f"{int(bin_edges[i])}k - {int(bin_edges[i+1])}k" for i in range(len(bin_edges)-1)]

y_max = max(bin_counts) * 1.1

bins_df = pd.DataFrame({
    "Intervalo": bin_labels,
    "Frecuencia": bin_counts,
    "Porcentaje": percentages
})

fig = px.bar(
    bins_df, 
    x="Intervalo", 
    y="Frecuencia",
    text=[f"{count} ({percent:.1f}%)" for count, percent in zip(bins_df["Frecuencia"], bins_df["Porcentaje"])],
    title="<b>Salary Distribution</b>",
    labels={"Intervalo": "Rango de Salarios Promedio (k USD)", "Frecuencia": "Frecuencia"},
    template="plotly_dark"
)

fig.update_traces(
    textposition="outside", 
)

fig.update_layout(
    title_x=0.5,
    font=dict(family="Arial, sans-serif", size=14, color="white"),
    xaxis=dict(title="Salary Range (k USD)", showgrid=True),
    yaxis=dict(title="Frecuency", showgrid=True, range=[0, y_max]),
    margin=dict(l=40, r=40, t=80, b=80),
    plot_bgcolor="rgba(0,0,0,0)",
)

fig.show()

Conclusion

The Salary Distribution graph reveals a significant concentration of salaries within the 47k - 94k USD range, accounting for 78.8% of all employees. This suggests that the majority of employees fall into a mid-range salary bracket.

  • The 0k - 47k USD range represents 17.7% of the salaries, indicating a smaller portion of lower earners.
  • The 94k - 141k USD range, comprising only 3.5%, shows that very few employees earn high salaries.

This distribution highlights a notable imbalance, with the vast majority clustered in the mid-salary range, while the higher salary ranges have significantly lower representation.

4. Average Salary by Year

This analysis calculates and visualizes the average salary per year for all employees. Grouping the data by year reveals salary trends over time, helping to identify periods of salary growth, stagnation, or decline.