Skip to main content
HomeBlogData Engineering

Top 3 Trends in Data Infrastructure for 2021

Get your data engineering function ahead of the curve with orchestration platforms, data discovery engines, and data lakehouses.
Jan 2021  · 3 min read

Data engineering continues to be a top priority for enterprises, and in 2021, there will be exciting developments in the data infrastructure space. In a recent webinar, Maksim Percherskiy, Data Engineer at The World Bank and former Chief Data Officer of the City of San Diego, highlighted three trends to watch out for in particular: data orchestration platforms, data discovery engines, and data lakehouses.

Data orchestration platforms

Although orchestration platforms have been around for many years to manage computer systems and software, data orchestration is a relatively new concept that abstracts data access across storage systems, virtualizes data, and presents data to data-driven applications. Data orchestration platforms help companies become more data-driven by combining siloed data from multiple data storage locations and making them usable. Examples include Apache Airflow, Prefect, Luigi, and Stitch, which are compatible with modern software approaches like version control, DevOps, and continuous integration.

DataCamp’s course Introduction to Airflow in Python is a great place to start to learn how to implement and schedule ETL and data engineering workflows in an easy and repeatable way.

Data discovery engines

It makes sense that as data increases, companies will invest more time in enabling their teams to find the data they need, document it, and reduce rework. Data discovery engines like Lyft’s Amundsen and Uber’s Databook aim to improve the productivity of data users by providing a search interface for data. These tools rely on metadata, which support productivity and compliance by allowing data infrastructure to scale as companies grow. The goal of data discovery is to make data more FAIR: findable, accessible, interoperable, and reusable.

Data lakehouse

There are different use cases for data lakes versus data warehouses. Each is designed for different purposes and for different end users. For example, data warehouses are typically used by analysts for the purpose of collecting business intelligence—they contain flat files that have been explicitly processed for their work. On the other hand, data lakes are set up and maintained by data engineers who integrate them into data pipelines. Data scientists work more closely with data lakes as they contain data of a wider and more current scope.

Data lakehouses are exactly what they sound like: a combination of data lakes and data warehouses. Platforms like Snowflake and Databrick's Delta Lake have created solutions to provide a single source of truth for data that combines the benefits of data lakes and data warehouses. Data lakehouses implement data warehouses’ data structures and management features for data lakes, which are typically more cost-effective for data storage. At the same time, they retain the schema and versioning of the data. With data lakehouses, data scientists will be able to conduct both machine learning and business intelligence.

Source: Databricks

All three of these trends will be crucial for enterprises to scale data infrastructure in 2021. They’re useful for companies that were built around data as well as for those that are struggling to sustainably integrate data in their workflows. Find out more about the challenges large companies face to operationalize their data and become data-driven.

Related
Data Engineering Vector Image

How to Become a Data Engineer in 2023: 5 Steps for Career Success

Discover how to become a data engineer and learn the essential skills. Develop your knowledge and portfolio to prepare for the data engineer interview.
Javier Canales Luna 's photo

Javier Canales Luna

17 min

5 Essential Data Engineering Skills

Discover the data engineering skills you need to thrive in the industry. Find out about the roles and responsibilities of a data engineer, and how you can develop your own skills.
Joleen Bothma's photo

Joleen Bothma

9 min

What Does a Data Engineer Do?

Curious about what a data engineer does? We break down the different data engineer roles & career paths and look at a typical data engineering project.
Joleen Bothma's photo

Joleen Bothma

9 min

A List of The 17 Best ETL Tools And Why To Choose Them

This blog post covers the top 16 ETL (Extract, Transform, Load) tools for organizations, like Talend Open Studio, Oracle Data Integrate, and Hadoop.
DataCamp Team's photo

DataCamp Team

12 min

Data Engineer Salaries Around the World: How Much Do Data Engineers Make?

Find out how much data engineers get paid in various countries and discover the factors that influence data engineer salaries.
Matt Crabtree's photo

Matt Crabtree

10 min

Introduction to LangChain for Data Engineering & Data Applications

LangChain is a framework for including AI from large language models inside data pipelines and applications. This tutorial provides an overview of what you can do with LangChain, including the problems that LangChain solves and examples of data use cases.
Richie Cotton's photo

Richie Cotton

11 min

See MoreSee More