SALARY PREDICTION
1.0 Introduction
In this study, we seek to find out how differences in salary are explained by some selected indicators such as working experience, level of education, gender, age, and others. This will help us explain why salaries are not the same across individuals with different characteristics.
2.1 Loading data
We already have available data on 173 individuals, stored as a .csv file. The data has details of each individual's salary, gender, age, education level, job title, and years of working experience. We will now proceed to load the data into our notebook.
# Importing necessary libraries.
import pandas as pd
import numpy as np
# Loading the data
data = pd.read_csv("Salary Data.csv")
data2.2 Data Cleaning
We start our analysis by preparing the data to meet our criteria. We make sure the data is ready by performing the following:
- Checking for Null values and resolving them
- Checking for duplicates and resolving
- Checking for Outliers
- Checking for improper data format and others.
These procedures are conducted below.
# Checking for missing values
data.isna().sum()We observe that each column has two null values. This is probably because two observations in the data have missing attributes. Since this value is relatively small (2 obs : 374 obs), we can proceed to drop these two(2) observations from the data.
# Dropping NAN values
data.dropna(how="all", inplace=True)
# Resetting index to reflect the new changes
data.reset_index(drop=True, inplace=True)
datadf.duplicated()We have dropped the two observations now and we had to reset the index as well to capture the new changes.
2.3 Data Exploration
We conduct the following at this section:
- shape, size and columns of data
- data info
- description
- How many people participated according to gender?
- How many of them are bachelor, master's or PhD holders?
- How many of them are bachelor, master's or PhD holders according to gender?
- Average salary by job
- Income Distribution Across Gender with Equal level of education
- Age distribution of the Respondents
- Salary distribution
- Relationship between Salary and working experience
- Relationship between age and salary
# shape of data
data.shape# size of data
data.size# columns of data
data.columns# Summary info
data.info()# Summary Statistics
data.describe()