Skip to main content
HomeBlogData Engineering

Top 3 Trends in Data Infrastructure for 2021

Get your data engineering function ahead of the curve with orchestration platforms, data discovery engines, and data lakehouses.
Jan 2021  · 3 min read

Data engineering continues to be a top priority for enterprises, and in 2021, there will be exciting developments in the data infrastructure space. In a recent webinar, Maksim Percherskiy, Data Engineer at The World Bank and former Chief Data Officer of the City of San Diego, highlighted three trends to watch out for in particular: data orchestration platforms, data discovery engines, and data lakehouses.

Data orchestration platforms

Although orchestration platforms have been around for many years to manage computer systems and software, data orchestration is a relatively new concept that abstracts data access across storage systems, virtualizes data, and presents data to data-driven applications. Data orchestration platforms help companies become more data-driven by combining siloed data from multiple data storage locations and making them usable. Examples include Apache Airflow, Prefect, Luigi, and Stitch, which are compatible with modern software approaches like version control, DevOps, and continuous integration.

DataCamp’s course Introduction to Airflow in Python is a great place to start to learn how to implement and schedule ETL and data engineering workflows in an easy and repeatable way.

Data discovery engines

It makes sense that as data increases, companies will invest more time in enabling their teams to find the data they need, document it, and reduce rework. Data discovery engines like Lyft’s Amundsen and Uber’s Databook aim to improve the productivity of data users by providing a search interface for data. These tools rely on metadata, which support productivity and compliance by allowing data infrastructure to scale as companies grow. The goal of data discovery is to make data more FAIR: findable, accessible, interoperable, and reusable.

Data lakehouse

There are different use cases for data lakes versus data warehouses. Each is designed for different purposes and for different end users. For example, data warehouses are typically used by analysts for the purpose of collecting business intelligence—they contain flat files that have been explicitly processed for their work. On the other hand, data lakes are set up and maintained by data engineers who integrate them into data pipelines. Data scientists work more closely with data lakes as they contain data of a wider and more current scope.

Data lakehouses are exactly what they sound like: a combination of data lakes and data warehouses. Platforms like Snowflake and Databrick's Delta Lake have created solutions to provide a single source of truth for data that combines the benefits of data lakes and data warehouses. Data lakehouses implement data warehouses’ data structures and management features for data lakes, which are typically more cost-effective for data storage. At the same time, they retain the schema and versioning of the data. With data lakehouses, data scientists will be able to conduct both machine learning and business intelligence.

Source: Databricks

All three of these trends will be crucial for enterprises to scale data infrastructure in 2021. They’re useful for companies that were built around data as well as for those that are struggling to sustainably integrate data in their workflows. Find out more about the challenges large companies face to operationalize their data and become data-driven.

Related

How to Build Adaptive Data Pipelines for Future-Proof Analytics

Leverage data warehousing techniques combined with business logic to build a scalable and sustainable approach to data analytics.

Sanjana Putchala

10 min

What is A Graph Database? A Beginner's Guide

Explore the intricate world of graph databases with our beginner's guide. Understand data relationships, dive deep into the comparison between graph and relational databases, and explore practical use cases.
Kurtis Pykes 's photo

Kurtis Pykes

11 min

How to Craft an Impactful Data Engineer Cover Letter (With Examples)

Learn how to write an effective data engineering cover letter for any experience level using our step-by-step guide and examples.

Eva Chan

14 min

Which is the Best Snowflake Certification For 2024?

Discover the top Snowflake certifications for 2024 with our comprehensive guide. Find out which Snowflake certification aligns with your career goals.
Matt Crabtree's photo

Matt Crabtree

11 min

ETL vs ELT: Understanding the Differences and Making the Right Choice

Dive deep into the ETL vs ELT debate, uncovering the key differences, strengths, and optimal applications of each. Learn how these data integration methodologies shape the future of business intelligence and decision-making.
Julia Winn's photo

Julia Winn

6 min

Scaling Data Engineering in Retail with Mo Sabah, SVP of Engineering & Data at Thrive Market

Richie and Mo explore data engineering tools, data governance and data quality, collaboration between data analysts and data engineers, ownership mentality in data engineering and much more.
Richie Cotton's photo

Richie Cotton

51 min

See MoreSee More