📖 Background
You work for an international HR consultancy helping companies attract and retain top talent in the competitive tech industry. As part of your services, you provide clients with insights into industry salary trends to ensure they remain competitive in hiring and compensation practices.
Your team wants to use a data-driven approach to analyse how various factors—such as job role, experience level, remote work, and company size—impact salaries globally. By understanding these trends, you can advise clients on offering competitive packages to attract the best talent.
In this competition, you’ll explore and visualise salary data from thousands of employees worldwide. f you're tackling the advanced level, you'll go a step further—building predictive models to uncover key salary drivers and providing insights on how to enhance future data collection.
💾 The data
The data comes from a survey hosted by an HR consultancy, available in 'salaries.csv'
.
Each row represents a single employee's salary record for a given year:
work_year
- The year the salary was paid.experience_level
- Employee experience level:EN
: Entry-level / JuniorMI
: Mid-level / IntermediateSE
: Senior / ExpertEX
: Executive / Director
employment_type
- Employment type:PT
: Part-timeFT
: Full-timeCT
: ContractFL
: Freelance
job_title
- The job title during the year.salary
- Gross salary paid (in local currency).salary_currency
- Salary currency (ISO 4217 code).salary_in_usd
- Salary converted to USD using average yearly FX rate.employee_residence
- Employee's primary country of residence (ISO 3166 code).remote_ratio
- Percentage of remote work:0
: No remote work (<20%)50
: Hybrid (50%)100
: Fully remote (>80%)
company_location
- Employer's main office location (ISO 3166 code).company_size
- Company size:S
: Small (<50 employees)M
: Medium (50–250 employees)L
: Large (>250 employees)
EXECUTIVE SUMMARY
-
The top 5 job titles with the highest average salary (in USD) are all managerial positions
-
Employees with no remote work earn higher on average than those working fully remote. However, relative to others, employees working hybrid earn the least on average.
-
The salary distribution (in USD) across company sizes shows that employees in medium sized companies earn the highest on average, followed closely by those in large companies. While small company employees, earn the least.
#import data set
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
salaries_df=pd.read_csv('salaries.csv')
salaries_df.head()
# top 5 job titles with the highest average salary (in USD)
job_by_avg_salaries = salaries_df.groupby('job_title')['salary_in_usd'].mean()
top_5_job_titles = job_by_avg_salaries.nlargest(5).reset_index()
top_5_job_titles
# visualization (barplot) top 5 job titles with the highest average salary (in USD)
plt.figure(figsize=(8,5), dpi=300)
sns.barplot(x='salary_in_usd', y='job_title', hue='job_title', data=top_5_job_titles, palette='mako', legend=False, edgecolor='black')
plt.title('Top 5 job titles with the highest average salary (in USD)')
plt.ylabel('Job title')
plt.xlabel('Salary in USD')
plt.legend()
plt.tight_layout()
The top 5 job titles with the highest average salary (in USD) are all managerial positions
# Compare the average salaries for employees working remotely
avg_remote_salary = salaries_df.groupby('remote_ratio')['salary_in_usd'].mean().reset_index()
avg_remote_salary
#trend visualization of average salaries for employees working remotely
plt.figure(figsize=(5,4), dpi=300)
sns.barplot(x='remote_ratio', y='salary_in_usd', data=avg_remote_salary, hue='remote_ratio', palette='pastel', width=0.5, edgecolor='black', legend=False)
plt.title('Average salaries for employees working remotely')
plt.xlabel('Remote ratio')
plt.ylabel('Salary in USD')
plt.legend()
plt.tight_layout()
Employees with no remote work earn higher on average than those working fully remote. However, relative to others, employees working hybrid earn the least on average.
#salary distribution (in USD) across company sizes
sal_dist=salaries_df.groupby('company_size')['salary_in_usd'].mean().reset_index()
sal_dist
#visualization of salary distribution (in USD) across company sizes
plt.style.use('fivethirtyeight')
plt.figure(figsize=(5,4), dpi=300)
slices=sal_dist['salary_in_usd']
labels=sal_dist['company_size']
explode=[0,0.1,0]
plt.pie(slices, labels=labels, shadow=True, startangle=90, wedgeprops={'edgecolor':'black'},
explode=explode, autopct='%1.1f%%')
plt.title('Salary distribution (in USD) across company sizes')
The salary distribution (in USD) across company sizes shows that employees in medium sized companies earn the highest on average, followed closely by those in large companies. While small company employees, earn the least.
💪 Competition challenge
In this second level, you’ll create visualisations to analyse the data and uncover trends. If you’re up for an even greater challenge, head to level three! Create a report that answers the following:
- Create a bar chart displaying the top 5 job titles with the highest average salary (in USD).
- Compare the average salaries for employees working remotely 100%, 50%, and 0%. What patterns or trends do you observe?
- Visualise the salary distribution (in USD) across company sizes (S, M, L). Which company size offers the highest average salary?
🧑⚖️ Judging criteria
This is a community-based competition. Once the competition concludes, you'll have the opportunity to view and vote for the best submissions of others as the voting begins. The top 5 most upvoted entries will win. The winners will receive DataCamp merchandise.