📖 1. Background
You work for an international HR consultancy helping companies attract and retain top talent in the competitive tech industry. As part of your services, you provide clients with insights into industry salary trends to ensure they remain competitive in hiring and compensation practices.
Your team wants to use a data-driven approach to analyse how various factors—such as job role, experience level, remote work, and company size—impact salaries globally. By understanding these trends, you can advise clients on offering competitive packages to attract the best talent.
In this competition, you’ll explore and visualise salary data from thousands of employees worldwide. If you're tackling the advanced level, you'll go a step further—building predictive models to uncover key salary drivers and providing insights on how to enhance future data collection.
💾 2. The data
The data comes from a survey hosted by an HR consultancy, available in 'salaries.csv'.
Each row represents a single employee's salary record for a given year:
work_year- The year the salary was paid.experience_level- Employee experience level:EN: Entry-level / JuniorMI: Mid-level / IntermediateSE: Senior / ExpertEX: Executive / Director
employment_type- Employment type:PT: Part-timeFT: Full-timeCT: ContractFL: Freelance
job_title- The job title during the year.salary- Gross salary paid (in local currency).salary_currency- Salary currency (ISO 4217 code).salary_in_usd- Salary converted to USD using average yearly FX rate.employee_residence- Employee's primary country of residence (ISO 3166 code).remote_ratio- Percentage of remote work:0: No remote work (<20%)50: Hybrid (50%)100: Fully remote (>80%)
company_location- Employer's main office location (ISO 3166 code).company_size- Company size:S: Small (<50 employees)M: Medium (50–250 employees)L: Large (>250 employees)
3. Code
import pandas as pd # Importing of the Python Package Pandas aliased AS pd salaries_df = pd.read_csv('salaries.csv') # Creation of the object "salaries_df" salaries Dataframe # Using the Pandas function of read_csv on the file named 'salaries.csv' salaries_df.head(5) # Object salaries_df and displaying only the head or top five rows of the data Data
import pandas as pd
# Importing of the Python Package Pandas aliased AS pd
salaries_df = pd.read_csv('salaries.csv')
# Creation of the object "salaries_df" salaries Dataframe
# Using the Pandas function of read_csv on the file named 'salaries.csv'
salaries_df.head(5)
# Object salaries_df and displaying only the head or top five rows of the data Data💪 4. Competition challenge
In this first level, you’ll explore and summarise the dataset to understand its structure and key statistics. If you want to push yourself further, check out level two! Create a report that answers the following:
- How many records are in the dataset, and what is the range of years covered?
- What is the average salary (in USD) for Data Scientists and Data Engineers? Which role earns more on average?
- How many full-time employees based in the US work 100% remotely?
🧑⚖️ 5. Judging criteria
This is a community-based competition. Once the competition concludes, you'll have the opportunity to view and vote for the best submissions of others as the voting begins. The top 5 most upvoted entries will win. The winners will receive DataCamp merchandise.
✅ 6. Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your story.
- Make sure the workbook reads well and explains how you found your insights.
- Try to include an executive summary of your recommendations at the beginning.
- Check that all the cells run without error
⌛️ Time is ticking. Good luck!
7. Descriptive Analytics
7.A. Columns
Number of Columns: 11
Which are: work_year, experience_level, employment_type, job_title, salary, salary_currency, salary_in_usd, employee_residence, remote_ratio, company_location, company_size
7.B. Searching for Null Values
Found Zero Null values, which allows us to proceed with answering the questions without needing to clean/modify the data.
SELECT
(SELECT COUNT(S.work_year) FROM 'salaries.csv' AS S WHERE S.work_year IS NULL) AS Work_Year,
(SELECT COUNT(S.experience_level) FROM 'salaries.csv' AS S WHERE S.experience_level IS NULL) AS Experience_level,
(SELECT COUNT(S.employment_type) FROM 'salaries.csv' AS S WHERE S.employment_type IS NULL) AS Employment_Type,
(SELECT COUNT(S.job_title) FROM 'salaries.csv' AS S WHERE S.job_title IS NULL) AS Job_Title,
(SELECT COUNT(S.salary) FROM 'salaries.csv' AS S WHERE S.salary IS NULL) AS Salary,
(SELECT COUNT(S.salary_currency) FROM 'salaries.csv' AS S WHERE S.salary_currency IS NULL) AS Salary_Currency,
(SELECT COUNT(S.salary_in_usd) FROM 'salaries.csv' AS S WHERE S.salary_in_usd IS NULL) AS Salary_In_USD,
(SELECT COUNT(S.employee_residence) FROM 'salaries.csv' AS S WHERE S.employee_residence IS NULL) AS Employee_Residence,
(SELECT COUNT(S.remote_ratio) FROM 'salaries.csv' AS S WHERE S.remote_ratio IS NULL) AS Remote_Ratio,
(SELECT COUNT(S.company_location) FROM 'salaries.csv' AS S WHERE S.company_location IS NULL) AS Company_Location,
(SELECT COUNT(S.company_size) FROM 'salaries.csv' AS S WHERE S.company_size IS NULL) AS Company_Size;8. Questions
8.A.
8.A.1 How many records are in the dataset?
57194 Records
SELECT COUNT(*) FROM "salaries.csv";
8.A.2 What is the range of years covered?
4 years
SELECT (MAX(work_year) - MIN(work_year)) FROM "salaries.csv";
Table View:
SELECT
(SELECT COUNT(*) FROM "salaries.csv") AS "Count Records",
(SELECT (MAX(work_year) - MIN(work_year)) FROM "salaries.csv") AS Range,
(SELECT MIN(work_year) FROM "salaries.csv") AS "Minimum Years",
(SELECT MAX(work_year) FROM "salaries.csv") AS "Maximum Years";8.B.
8.B.1 What is the average salary (in USD) for Data Scientists and Data Engineers?
Data Scientists Average Salary: $159397.07
Data Engineers Average Salary: $149315.00
8.B.2 Which role earns more on average?
Analytics Engineering Manager Average Salary: $399880.00