Home PythonIntroduction to Data Pipelines

Introduction to Data Pipelines

Name: Introduction to Data Pipelines
Rating: 4.6363635 (11 reviews)

4.6+

11 reviews

Intermediate

This introductory course will help you hone the skills to build effective, performant, and reliable data pipelines.

Start Course for Free

4 Hours15 Videos57 Exercises

6,743 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

Empowering Analytics with Data Pipelines

Data pipelines are at the foundation of every strong data platform. Building these pipelines is an essential skill for data engineers, who provide incredible value to a business ready to step into a data-driven future. This introductory course will help you hone the skills to build effective, performant, and reliable data pipelines.

Building and Maintaining ETL Solutions

Throughout this course, you’ll dive into the complete process of building a data pipeline. You’ll grow skills leveraging Python libraries such as `pandas` and `json` to extract data from structured and unstructured sources before it’s transformed and persisted for downstream use. Along the way, you’ll grow confidence tools and techniques such as architecture diagrams, unit-tests, and monitoring that will help to set your data pipelines out from the rest. As you progress, you’ll put your new-found skills to the test with hands-on exercises.

Supercharge Data Workflows

After completing this course, you’ll be ready to design, develop and use data pipelines to supercharge your data workflow in your job, new career, or personal project.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

In the following Tracks

Certification Available

Data Engineer in Python

Go To Track

Machine Learning Engineer

Go To Track

1
Introduction to Data Pipelines
Free
Get ready to discover how data is collected, processed, and moved using data pipelines. You will explore the qualities of the best data pipelines, and prepare to design and build your own.
Play Chapter Now
Introducing data pipelines
50 xp
What is a data pipeline?
50 xp
Components of a data pipeline
100 xp
Producers and consumers of data pipelines
100 xp
Designing data pipelines
50 xp
Architecture diagrams for data pipelines
50 xp
Reading architecture diagrams
50 xp
Data pipeline design process
100 xp
Qualities of great data pipelines
50 xp
Building quality data pipelines
50 xp
Persisting data throughout a pipeline
50 xp
Qualities of sound data pipelines
100 xp
2
Building ETL Pipelines
Dive into leveraging pandas to extract, transform, and load data as you build your first data pipelines. Learn how to make your ETL logic reusable, and apply logging and exception handling to your pipelines.
Play Chapter Now
Extracting data from structure sources
50 xp
Extracting data from parquet files
100 xp
Pulling data from SQL databases
100 xp
Building functions to extract data
100 xp
Transforming data with pandas
50 xp
Filtering pandas DataFrames
100 xp
Transforming sales data with pandas
100 xp
Validating data transformations
100 xp
Persisting data with pandas
50 xp
Loading sales data to a CSV file
100 xp
Customizing a CSV file
100 xp
Persisting data to files
100 xp
Monitoring a data pipeline
50 xp
Logging within a data pipeline
100 xp
Handling exceptions when loading data
100 xp
Monitoring and alerting within a data pipeline
100 xp
3
Advanced ETL Techniques
Supercharge your workflow with advanced data pipelining techniques, such as working with non-tabular data and persisting DataFrames to SQL databases. Discover tooling to tackle advanced transformations with pandas, and uncover best-practices for working with complex data.
Play Chapter Now
Extracting non-tabular data
50 xp
Ingesting JSON data with pandas
100 xp
Reading JSON data into memory
100 xp
Transforming non-tabular data
50 xp
Iterating over dictionaries
100 xp
Parsing data from dictionaries
100 xp
Transforming JSON data
100 xp
Transforming and cleaning DataFrames
100 xp
Advanced data transformation with pandas
50 xp
Filling missing values with pandas
100 xp
Grouping data with pandas
100 xp
Applying advanced transformations to DataFrames
100 xp
Loading data to a SQL database with pandas
50 xp
Loading data to a Postgres database
100 xp
Validating data loaded to a Postgres Database
100 xp
4
Deploying and Maintaining a Data Pipeline
In this final chapter, you’ll create frameworks to validate and test data pipelines before shipping them into production. After you’ve tested your pipeline, you’ll explore techniques to run your data pipeline end-to-end, all while allowing for visibility into pipeline performance.
Play Chapter Now
Manually testing a data pipeline
50 xp
Testing data pipelines
50 xp
Validating a data pipeline at "checkpoints"
100 xp
Testing a data pipeline end-to-end
100 xp
Unit-testing a data pipeline
50 xp
Validating a data pipeline with assert and isinstance
100 xp
Writing unit tests with pytest
100 xp
Creating fixtures with pytest
100 xp
Unit testing a data pipeline with fixtures
100 xp
Running a data pipeline in production
50 xp
Orchestration and ETL tools
50 xp
Data pipeline architecture patterns
100 xp
Running a data pipeline end-to-end
100 xp
Congratulations!
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

In the following Tracks

Certification Available

Data Engineer in Python

Go To Track

Machine Learning Engineer

Go To Track

Datasets

scores.csv schools_modified.csv amazon_sales_cleaned_sql.csv tax_rate_cleaned.csv

Collaborators

George Boorman

Arne Warnke

Anastasia Dvoryanchikova

Katerina Zahradova

Prerequisites

Introduction to Data Warehousing Streamlined Data Ingestion with pandas

Jake Roach

Data Engineer

Jake is a Data Engineer at Delaware North, and DataCamp Instructor. He and his team are building a state-of-the-art data platform for a multi-billion dollar organization, powered by Astronomer, Airflow, AWS, and Databricks. Born and raised in Buffalo, NY, when he's not working with data, you can find him out at the golf course playing a quick nine holes before dark!

Don’t just take our word for it

*4.6

from 11 reviews

73%

18%

Sort by

Highest to Lowest
Lowest to Highest
Most recent
Top reviews

Stefan C.

5 months

Good course

Alex N.

10 months

This course offers valuable insights into classes and inheritance in Python. There should be more software engineering and best coding practices design on the platform.

Sorin I.

4 months

Excellent courses and practice mode, very nice UI/UX and trainer

Robert T.

5 months

This was a great intro to the concepts. Overall it walked through a lot of key items all through the ETL process. The section on testing felt a bit rushed and didn't have a ton of explanation to it, but it did talk about the importance of testing the work before sending it to prod.

Mauricio P.

9 months

Very funny and useful

"Good course"

Stefan C.

"This course offers valuable insights into classes and inheritance in Python. There should be more software engineering and best coding practices design on the platform."

Alex N.

"Excellent courses and practice mode, very nice UI/UX and trainer"

Sorin I.

Join over 13 million learners and start Introduction to Data Pipelines today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

In the following Tracks

Data Engineer in Python

Machine Learning Engineer