Skip to main content

New Course: Introduction to Airflow in Python

If you’re responsible for delivering data on a schedule, learn how to manage your data engineering workflows with DataCamp’s first course devoted to Airflow.
Jun 2020  · 3 min read

Apache Airflow is a crucial part of the data engineering ecosystem. That’s why our introductory data engineering courses, Introduction to Data Engineering, Building Data Engineering Pipelines in Python, and Data Engineering for Everyone, include lessons on Airflow. Now, we’re excited to announce the launch of our first dedicated course on Airflow: Introduction to Airflow in Python.

What is Airflow?

Apache Airflow is an open-source tool to create, monitor, and schedule workflows. In this context, a workflow is a set of steps to accomplish a data engineering task, like streaming data or writing data to a database. Airflow represents workflows as Directed Acyclic Graphs, or DAGs. Essentially this means workflows are represented by a set of tasks and dependencies between them, like this:

Who’s this course for?

Are you responsible for delivering data on a schedule? Then this course is for you! In terms of roles, this course is intended for data engineers, data scientists, and machine learning scientists who want to upskill in data engineering. The only prerequisites for this course are Intermediate Python and Introduction to Shell.

What you’ll learn

This is an introductory course where you’ll learn everything you need to get started with Airflow.

First, you’ll get comfortable with components of Apache Airflow such as DAGs and why they’re useful. Then, you’ll learn how to implement Airflow DAGs using operators, tasks, and scheduling. You’ll also learn about sensors and executors to monitor and debug Airflow workflows. In the final chapter, you’ll use what you’ve learned to build a production-quality workflow in Airflow.

How you’ll learn

This is one of the most interactive Airflow courses you’ll find online. You’ll use both the Airflow web UI, or user interface and the Airflow command-line interface (CLI) to schedule and monitor workflows. Along with classic coding exercises, this course has IDE (Integrated Development Environment) exercises, where you can look at multiple files and run different Airflow CLI commands in a real-world environment.

To gain experience in different ways of interacting with Airflow, we incorporate exercises that embed the Airflow web UI:

The instructor of this course, Mike Metzger, is a data engineer with 20+ years of experience in the data, networking, and security space. You may know him from his other fantastic DataCamp course, Cleaning Data with PySpark.

Enjoy the course and let us know what you think at [email protected]!

← Back to Blogs