Skip to main content

What Does a Data Engineer Do?

Curious about what a data engineer does? We break down the different data engineer roles & career paths and look at a typical data engineering project.
Oct 2022

You may have heard that data engineering is the new data science, and the immense growth in the field of data engineering proves it. Companies now recognize the value in hiring data engineers to design, build and maintain the architecture they need to make data science and analytics successful. You can read more about the differences between data engineers and data scientists in a separate article.

However, you may also be wondering, what does a data engineer actually do? In this article, we break down the different data engineer roles and responsibilities and the career path that a data engineer may follow. Lastly, we give a peek behind the curtain of a typical data engineering project you may encounter in an organization. If you're looking for information on how to become a data engineer, you will find our separate article useful. 

Data Engineering Roles

Data engineering involves a large variety of skills, tools, and systems. There are four core groups of data engineer roles, and each of these groups must master a set of skills and tools to do their job effectively.

  • Generalists. Involved in all aspects of data collection, storage, analysis, and movement. They must know and be able to use a wide range of tools and skills.
  • Specialists in data storage. Responsible for setting up and managing relational and non-relational databases (like SQL, NoSQL, and PostgreSQL), data warehouses (like Redshift and Panoply), and big data systems (like Hadoop and Spark).
  • Specialists in programming and pipelines. Creating and managing the flow of data through scripts and data pipelines. They must be familiar with a few programming languages like Python, Java, and C++.
  • Specialists in analytics. Work closely with data scientists and other analytics professionals in the organization. They must be familiar with analytical tools (like Power BI and Tableau), machine learning libraries (like Tensorflow and PyTorch), and other tools that support analytical projects (like ETL tools and big data systems).

What Does a Data Engineer Do? The Data Engineer Career Path

The career path of a data engineer can vary based on the size of the company and the maturity of their data teams. However, most data engineers would typically follow this path:

  • Junior data engineer
  • Mid-level data engineer
  • Senior data engineer
  • Senior managerial roles

Junior Data Engineer

When just starting their careers, junior data engineers typically take on small tasks that maintain and support existing systems. This could be anything from testing systems and looking for and fixing bugs, to adding features to an existing system. During these early stages, a junior typically would not take on their own project but would instead take on a supporting role for their senior colleagues.

The most important part of the first few years as a junior data engineer is learning and gaining hands-on experience with the tools they will need to use later on in their careers. They are also learning how the different teams and departments work together to find solutions to the problems and questions that come up.

Mid-Level Data Engineer

A data engineer may be promoted to the mid-level after around 1 to 3 years. At this time they may be exposed to more project management aspects of the job and may be required to collaborate more with other teams and departments.

They are usually given the responsibility of designing and building systems that support data scientists and other analytical team members. They may still be under some supervision from a senior data engineer at this stage. In order for them to do this job effectively, they must develop good communication skills and be able to work well with other teams.

Data engineers could remain at this level for around 3 to 5 years. During this time, they would have developed their programming skills and should be familiar with all the tools and systems that are used at the organization. They can identify and fix any bugs or problems that come up, and they collaborate well within and across teams.

Senior Data Engineer

Once data engineers reach a senior level, they take on more managerial responsibilities. They may need to oversee one or more data engineers under them, teaching and assigning projects to them as they come up.

At this stage, the data engineer is proficient in the technical aspects of their role and can build systems and solve problems with relative ease. However, they are now more closely involved in the business side of things and need to think strategically about the direction of the data projects and the long-term effectiveness and optimization of their systems.

This requires a shift in how the data engineer thinks, which can be challenging. Many data engineers may not have a passion for strategic and business responsibilities, so they may choose not to advance further in the company.

Senior Managerial Roles

Once data engineers have obtained around six years or more of experience, they can move into more managerial roles if they choose, such as:

  • Data engineering manager
  • Director of data engineering
  • Chief data officer

In addition to being highly proficient in the technical skills obtained during lower levels, these roles require the data engineer to have strong data infrastructure and data architecture skills and must be able to manage and scale analytical teams. They also need to be able to define the processes for developing high-performance systems, scope out new projects, and define and manage SLAs for new and existing systems.

What Does a Data Engineer Do? A Typical Data Engineering Project

Let's take a look at some of the data engineer roles and responsibilities. Suppose you work for a large company that provides a food delivery service to customers via a mobile app. The app acts as the middleman between the restaurant and the driver. Customers place their orders on the app, and the restaurant is notified. Once the food is ready, a driver is assigned, and the food is delivered to the customer.

As you can imagine, an app like this could generate a lot of data daily. From data on restaurants, drivers, and customers, to logs for every interaction on the app. Also consider the data collected for any customer service calls for complaints, compliments, or disputes. Or even logs from errors that occur on the app.

If a data scientist or data analyst at your company is tasked with identifying trends in orders which they can then use to build a machine learning model. To do this, they come to you to extract and prepare data on the orders aggregated by day. They also need to be able to split the data between first-time and repeat customers.

Gain Clarity

To solve this problem, the data engineer must first get clarity on the problem through these steps:

  1. Identify granularity, per order, per day, week, month, year. Based on the above request, the order data must be aggregated by day with a split by customer type (first-time or repeat).
  2. Identify whether any filters should be applied to the data, such as by country or phone model.
  3. Identify the timeframe of the data. For example, is it for all time or just the last year?
  4. Identify the data sources and/or tables for this data. This data is stored in a central data warehouse, and the data engineer would need to access the orders table and the customer table. If additional filters are needed, then more tables may need to be accessed.

Data Extraction

Now that the data engineer has gained more clarity on the problem, they can move on to data extraction and exploration by going through these steps:

  1. Identify what joins should be used between the orders and the customers table and what the relationships are between these tables (such as what keys must be used to join the tables). This requires a solid understanding of SQL and data modeling.
  2. Create a categorical feature for customer type based on the number of orders each customer has made. This feature must contain categories for 'first-time customer' and 'repeat customer.'
  3. Assess the quality of the data. Identify if missing or anomalous data may need to be corrected.

Once the data engineer has prepared the data that the data scientist or data analyst requires, they need to create an API endpoint that can be queried to extract the data. This entire project could take anywhere from a few days to a few months, depending on the volume and complexities of the data.

Throughout this process, the data engineer may need to work with many different systems depending on where the data is stored and if any additional processing is required for the data. 

Examples of some of the systems that can be encountered in this problem are SQL Server, Hadoop, or Redshift for the data storage, SQL to query the data, and python for writing the scripts that process the data.

Final Thoughts

As you can see, a typical data engineering project contains a few core skills that are crucial to data engineering, such as building data pipelines. To accelerate your learning and prepare yourself for a role in data engineering, try the data engineering skill track on DataCamp.

Hopefully, this article gave you some insight into the role of data engineers and what they actually do. If you’re considering starting a career in data engineering, then you should also be more familiar with the career path you can expect.

Building Data Engineering Pipelines in Python

Beginner
4 hours
20,132
Learn how to build and test data engineering pipelines in Python using PySpark and Apache Airflow.
See DetailsRight Arrow
Start Course

Introduction to Data Engineering

Beginner
4 hours
86,110
Learn about the world of data engineering in this short course, covering tools and topics like ETL and cloud computing.

Understanding Data Engineering

Beginner
2 hours
146,395
Discover how data engineers lay the groundwork that makes data science possible. No coding involved!
See all coursesRight Arrow
Related

What is Data Engineering?

Learn what data engineering is, what is the difference between data science and data engineering, the scope in the field, and how to learn data engineering.
Çağlar Uslu's photo

Çağlar Uslu

8 min

Top 3 Trends in Data Infrastructure for 2021

Get your data engineering function ahead of the curve with orchestration platforms, data discovery engines, and data lakehouses.
Joyce Chiu's photo

Joyce Chiu

3 min

Data Lakes vs. Data Warehouses

Understand the differences between the two most popular options for storing big data.
DataCamp Team's photo

DataCamp Team

4 min

Data Engineering Vector Image

How to Become a Data Engineer in 2023: 5 Steps for Career Success

Discover how to become a data engineer and learn the essential skills. Develop your knowledge and portfolio to prepare for the data engineer interview.
Javier Canales Luna 's photo

Javier Canales Luna

17 min

Data engineering interview q and a

The Top 21 Data Engineering Interview Questions and Answers

With these top data engineering interview questions and answers, you can make sure you ace your next interview.
Abid Ali Awan's photo

Abid Ali Awan

16 min

5 Essential Data Engineering Skills

Discover the data engineering skills you need to thrive in the industry. Find out about the roles and responsibilities of a data engineer, and how you can develop your own skills.
Joleen Bothma's photo

Joleen Bothma

See MoreSee More