Gear Up for Success: Key Takeaways from the 2024 Stack Overflow Developer Survey for Tech Students ๐๐
Discover the latest trends, tools, and insights from the "2024 Stack Overflow Developer Survey" to help you navigate your future in tech. Cover Image from StackOverflow
Key Findings
-
Most Used Languages:
JavaScript, HTML/CSS, Python, and SQL are the most used languages by surveyed developers. This highlights the significant amount of work in web development, databases, and Python. Additionally, the prevalence of libraries such as NumPy, Pandas, Scikit-learn, and TensorFlow in the survey results underscores Python's importance in the ongoing boom of data science, machine learning, and AI. -
Highest Earners:
The highest earner is a Project Manager with an annual salary of $16,256,603, followed by Blockchain Developers, Cloud Engineers, and Hardware Engineers. -
Popular Online Resources:
The most popular online resources for learning how to code are technical documentation, StackOverflow, written tutorials, and blogs. The distribution among beginners, intermediates, and advanced users is even across all resources, with all the most preferred resources being written. -
Job Satisfaction:
There is little to no correlation between years of coding experience, salary, remote work, and job satisfaction. This suggests that job satisfaction is influenced by a variety of factors beyond what is examined in the survey. -
AI Integration:
Developers who integrate AI into their workflows are generally favorable toward its use. They primarily use AI for writing/debugging code, documenting code, and searching for answers. Most report increased productivity and faster workflows. However, the main challenges include a lack of trust in AI-generated outputs and the tools' lack of context for the codebase.
1. Overview
My main objective is to acquire meaningful insights from the StackOverflow survey that can be shared with my classmates in the upcoming school year. By looking into the survey data, I aim to find important trends and patterns that will help us prepare for our future careers.
Disclaimer: While Stack Overflow has already done their analysis on this survey data, this analysis is specifically tailored to explore aspects that are most relevant to students and early-career professionals. The focus here is on the questions and insights that can directly support my classmates' understanding of the industry and help them make informed decisions about their learning and career paths.
The key questions this analysis aims to answer includes:
- What are the most frequently used tools and technologies in various tech roles (e.g., programming languages, cloud platforms, databases, frameworks)? How do these tools vary by job role and country?
- How have salaries varied for tech job roles, industries, and educational levels?
- Where do developers most commonly seek coding resources (e.g., books, online tutorials, technical documentation), and how do these preferences differ based on their years of coding experience?
- What factors have the most significant impact on job satisfaction among developers?
- For those who uses AI their development process, what are their stance on using AI tools on their work, Which parts of their development workflow are they currently using AI tools for? What are the key benefits and challenges associated with AI tools?
- Any other insights found during in the analysis?
2. Data Description
The dataset is sourced from the Stack Overflow Annual Developer Survey and contains 65,437 rows and 114 columns. The following table provides descriptions for the columns that are relevant to this analysis:
# | Column Name | Description |
---|---|---|
1 | ResponseId | The unique identifier for each respondent |
2 | ConvertedCompYearly | The respondent's annual salary converted to USD |
3 | RemoteWork | The respondent's current work situation |
4 | Age | The age of the respondent |
5 | Employment | The respondent's current employment status |
6 | YearsCode | The total number of years the respondent has been coding |
7 | DevType | The respondent's current job title |
8 | Country | The country in which the respondent resides |
9 | JobSat | The respondentโs satisfaction with their current professional job |
10 | LearnCodeOnline | The online resources the respondent used to learn coding |
11 | LanguageHaveWorkedWith | The programming language(s) the respondent primarily uses in their current job |
12 | DatabaseHaveWorkedWith | The database environment(s) the respondent primarily uses in their current job |
13 | PlatformHaveWorkedWith | The cloud platform(s) the respondent primarily uses in their current job |
14 | WebframeHaveWorkedWith | The web framework(s) the respondent primarily uses in their current job |
15 | ToolsTechHaveWorkedWith | The development and operations tool(s) the respondent primarily uses in their current job |
16 | AISearchDevHaveWorkedWith | The AI-powered search and development tool(s) the respondent has used in their current job |
17 | MiscTechHaveWorkedWith | Other programming frameworks, libraries, and tools the respondent has worked with |
18 | AIToolCurrently Using | The parts of the development workflow where the respondent is currently using AI tools |
3. Initial Data Exploration
First, we import the essential Python packages needed for exploring and analyzing our dataset. Then, we conduct an initial exploration to identify potential issues, such as missing values and other inconsistencies within our data. Following this, we will drop any columns that are not relevant to our objectives. This preliminary step helps us become acquainted with our data, refine its structure, and enhance our understanding of its content.
#Import necessary packages
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import scipy.stats as stats
from filter import cleaners, filter_jobs
from viz import graph
from pathlib import Path
from plotly.subplots import make_subplots
# Load the data into a Pandas DataFrame:
current_directory = Path.cwd()
filename = current_directory / 'data' /'survey_results_public.csv'
original_df = pd.read_csv(filename)
# Select specific columns from the DataFrame to keep only the relevant data.
# The columns are chosen based on their significance for the analysis.
df = original_df[[
"ResponseId",
"ConvertedCompYearly",
"RemoteWork",
"Age",
"Employment",
"YearsCode",
"DevType",
"Country",
"JobSat",
"Frustration",
"LearnCode",
"LearnCodeOnline",
"LanguageHaveWorkedWith",
"DatabaseHaveWorkedWith",
"PlatformHaveWorkedWith",
"WebframeHaveWorkedWith",
"ToolsTechHaveWorkedWith",
"AISearchDevHaveWorkedWith",
"MiscTechHaveWorkedWith",
"AIToolCurrently Using"
]]
# Preview the dataframe
df.head()
# Print information about the data, including the data types and number of non-null values
df.info()
# Check for missing values in each column
df.isnull().sum()
# Show the summary statistics of our numerical variables
df.describe()
In the initial data exploration, we identified a total of 65,437 respondents with attributes such as age, years of coding experience, job title, job satisfaction, and tools they have worked with. The dataset includes several columns with missing values, and the data types range from integers and floats to numerous object or string types.
Notably, the columns detailing the tools respondents have worked with contain strings separated by semicolons. This format will require special handling during analysis. The average job satisfaction score for the respondents is approximately 6.9, or 7 when rounded.
The average annual salary is approximately 86,155 USD, but there is high variability, with a range from 1 USD to 16,256,600 USD. Extreme outliers are present and will be addressed in further analysis. Additionally, some non-working respondents may have reported salary values that could influence the overall distribution.
The dataset is very dirty and will require extensive preprocessing to address missing values, outliers, and inconsistencies. Proper data cleaning is essential to ensure the analysis is accurate and reliable.
4. Analysis and Findings
4.1 What are the most frequently used tools and technologies in various tech roles. How do these tools vary by job role and country?
This a snapshot from the dashboard highlighting the most frequently used tools and technologies across different tech roles and countries. For an interactive exploration of the data, please run dashboard.py
and access the provided link in the terminal.
If you're eyeing a career in tech, now is the ideal moment to jump in! With ChatGPT (85%) leading the AI race and JavaScript (62%) still being the most popular programming language, the landscape is more vibrant than ever. Dive into the world of emerging technologies and get ahead of the curve with PostgreSQL, a top database choice at 51.81%, and React and Node.js, which dominate web frameworks.
Letโs delve into these survey results and discover how you can code your way to a successful and dynamic career in technology, where innovation and growth are at your fingertips.
โ
โ