Employee Database
SELECT *
FROM salaries
LIMIT 5;
1. Average Employee Tenure by Position
This analysis calculates and visualizes the average employee tenure based on the last held position within the organization.
By identifying the average years of service for different roles, this visualization helps:
- Assess employee retention across various positions.
- Highlight which roles experience longer tenure (e.g., Senior Staff and Senior Engineer).
- Identify roles with shorter retention periods to explore opportunities for improvement.
This information is valuable for understanding retention trends and creating strategies to improve employee longevity across all levels.
WITH past_employees AS (
SELECT
emp_no,
Min(from_date) AS start_date,
Max(to_date) AS end_date
FROM salaries
WHERE to_date != (SELECT MAX(to_date) FROM salaries)
GROUP BY emp_no),
employee_tenures AS (
SELECT
emp_no,
start_date,
end_date,
DATEDIFF(end_date, start_date) / 365 AS employee_tenure
FROM past_employees)
SELECT
title,
AVG(employee_tenure) AS avg_employee_tenure
FROM employee_tenures
INNER JOIN titles
ON employee_tenures.emp_no = titles.emp_no
AND employee_tenures.end_date = titles.to_date
GROUP BY title
ORDER BY avg_employee_tenure
import pandas as pd
import plotly.express as px
fig = px.bar(
employee_tenures,
x="avg_employee_tenure",
y="title",
text=employee_tenures["avg_employee_tenure"].apply(lambda x: f"{x:.2f}")
)
x_max = employee_tenures["avg_employee_tenure"].max() + 2
fig.update_layout(
title="Average Employee Tenure<br><sup>By Last Held Position</sup>",
title_x=0.5,
xaxis_title="Average Tenure",
yaxis_title="Position",
legend_title="Position",
template="plotly_dark",
xaxis = dict(range=[0, x_max]),
)
fig.update_traces(
textposition="outside"
)
fig.show()
Conclusion
The Average Employee Tenure graph reveals variations in tenure based on the last held position:
- Senior Staff: 7.45 years (longest average tenure).
- Senior Engineer: 7.34 years, indicating stability at senior levels.
- Technique Leader: 4.84 years, moderate tenure compared to senior roles.
- Engineer: 3.26 years, shorter tenure among mid-level positions.
- Staff: 2.98 years and Assistant Engineer: 2.93 years represent the shortest tenures.
This distribution highlights a trend where higher-level roles have longer tenures, while lower and mid-tier positions experience shorter employee retention.
2. Average Salary by Employee (Top 10)
This analysis calculates and visualizes the average salary for the top 10 employees. Salaries are averaged across all time periods for each employee, and the results are displayed in descending order to identify employees with the highest average salaries.
SELECT emp_no,
AVG(salary) AS avg_salary
FROM salaries
GROUP BY emp_no
ORDER BY avg_salary DESC
LIMIT 10;
import pandas as pd
import plotly.express as px
df_salary["emp_no"] = df_salary["emp_no"].astype(str)
df_salary["avg_salary_k"] = df_salary["avg_salary"]
y_max = df_salary["avg_salary_k"].max() + 20000
fig = px.bar(
df_salary,
x="emp_no",
y="avg_salary",
title="<b>Average Salary by Employee</b>",
text=df_salary["avg_salary_k"].apply(lambda x: f"{x/1000:.2f}k"),
labels={"emp_no": "Employee ID", "avg_salary_k": "Average Salary (k USD)"},
template="plotly_dark"
)
fig.update_traces(
textposition="outside"
)
fig.update_layout(
title_x=0.5,
font=dict(family="Arial, sans-serif", size=14, color="white"),
xaxis=dict(title="Employee ID", tickangle=-45),
yaxis=dict(
title="Average Salary (k USD)",
tickformat="~s",
range=[0, y_max]
),
margin=dict(l=40, r=40, t=80, b=80)
)
fig.show()
Conclusion
- The salary difference between the top employee and the 10th employee is only 6.8k USD, indicating relatively minor variations.
- This highlights a narrow range among the top earners, suggesting consistency in compensation for the highest-paid employees.
The graph provides insight into the top salary distribution, showcasing a uniform compensation strategy among the top earners.
3. Salary Distribution Analysis
This analysis visualizes the distribution of employee salaries. Salaries are grouped into specific ranges. The goal is to understand the spread and concentration of salaries across the organization.
SELECT emp_no,
AVG(salary) AS avg_salary
FROM salaries
GROUP BY emp_no
ORDER BY avg_salary DESC;
import pandas as pd
import numpy as np
import plotly.express as px
df_salary_time["avg_salary_k"] = df_salary_time["avg_salary"] / 1000
max_salary = df_salary_time["avg_salary_k"].max()
bin_size = max_salary / 3
bin_edges = [i * bin_size for i in range(4)]
bin_counts, _ = np.histogram(df_salary_time["avg_salary_k"], bins=bin_edges)
total_count = sum(bin_counts)
percentages = [count / total_count * 100 for count in bin_counts]
bin_labels = [f"{int(bin_edges[i])}k - {int(bin_edges[i+1])}k" for i in range(len(bin_edges)-1)]
y_max = max(bin_counts) * 1.1
bins_df = pd.DataFrame({
"Intervalo": bin_labels,
"Frecuencia": bin_counts,
"Porcentaje": percentages
})
fig = px.bar(
bins_df,
x="Intervalo",
y="Frecuencia",
text=[f"{count} ({percent:.1f}%)" for count, percent in zip(bins_df["Frecuencia"], bins_df["Porcentaje"])],
title="<b>Salary Distribution</b>",
labels={"Intervalo": "Rango de Salarios Promedio (k USD)", "Frecuencia": "Frecuencia"},
template="plotly_dark"
)
fig.update_traces(
textposition="outside",
)
fig.update_layout(
title_x=0.5,
font=dict(family="Arial, sans-serif", size=14, color="white"),
xaxis=dict(title="Salary Range (k USD)", showgrid=True),
yaxis=dict(title="Frecuency", showgrid=True, range=[0, y_max]),
margin=dict(l=40, r=40, t=80, b=80),
plot_bgcolor="rgba(0,0,0,0)",
)
fig.show()
Conclusion
The Salary Distribution graph reveals a significant concentration of salaries within the 47k - 94k USD range, accounting for 78.8% of all employees. This suggests that the majority of employees fall into a mid-range salary bracket.
- The 0k - 47k USD range represents 17.7% of the salaries, indicating a smaller portion of lower earners.
- The 94k - 141k USD range, comprising only 3.5%, shows that very few employees earn high salaries.
This distribution highlights a notable imbalance, with the vast majority clustered in the mid-salary range, while the higher salary ranges have significantly lower representation.
4. Average Salary by Year
This analysis calculates and visualizes the average salary per year for all employees. Grouping the data by year reveals salary trends over time, helping to identify periods of salary growth, stagnation, or decline.