Top 12 Data Engineering Projects for Hands-On Learning

Showcase your data engineering skills through these portfolio projects. Practice and deepen your understanding of various technologies to show potential employers your strengths!

Updated Dec 23, 2025 · 15 min read

Data engineering supports the movement and transformation of data. As companies rely on huge amounts of data to gain insights and drive innovation, the demand for data engineers continues to grow.

For data professionals, diving into data engineering projects offers a wealth of opportunities. Hands-on challenges sharpen your technical skills and provide a tangible portfolio to showcase your knowledge and experience.

In this article, I have curated a selection of data engineering projects designed to help you advance your skills and confidently tackle real-world data challenges!

Why Work on Data Engineering Projects?

Building a solid understanding of data engineering through theory and practice is important. If you’re reading this article, you may already know this, but here are three specific reasons to dive into these projects:

Building technical skills

Data engineering projects provide hands-on experience with technologies and methodologies. You'll develop proficiency in programming languages, database management, big data processing, and cloud computing. These technical skills are fundamental to data engineering roles and highly transferable across the tech industry.

Portfolio development

Creating a portfolio of data engineering projects demonstrates your practical abilities to potential employers. You provide tangible evidence of your capabilities by showcasing implementations of data pipelines, warehouse designs, and optimization solutions.

A strong portfolio sets you apart in the job market and complements your resume with real-world accomplishments.

Learning tools and technologies

The data engineering field employs a diverse array of tools and technologies. Working on projects exposes you to data processing frameworks, workflow management tools, and visualization platforms.

This practical experience keeps you current with industry trends and enhances adaptability in an evolving technological landscape.

Data Engineering Projects for Beginners

These projects aim to introduce the main tools used by data engineers. Start here if you are new to data engineering or need a refresher.

Project 1: ETL pipeline with open data (CSV to Parquet to BigQuery)

This project entails building an ETL pipeline using a publicly available dataset, such as weather or transportation data. You will extract the data from a raw CSV file, clean and transform it using Python, and load the transformed data into Google BigQuery.

To make this project truly modern, try using Polars for your transformations instead of the traditional Pandas library. Polars is significantly faster and becoming a favorite tool in the data engineering community. Additionally, before loading the data into the cloud, practice converting it into Parquet format. Parquet is a columnar storage format that is far more efficient than CSV and is the standard for big data storage.

This project is excellent for beginners as it introduces core ETL concepts—data extraction, transformation, and loading—while giving exposure to cloud tools like BigQuery and critical file formats.

You'll also learn how to interact with cloud data warehouses, a core skill in modern data engineering, using simple tools like Python and the BigQuery API. For an introduction, review the beginner’s guide to BigQuery.

As for the data, you can select an available dataset from either Kaggle or data.gov.

Resources

Here are some resources, including GitHub repositories and tutorials, that provide step-by-step guidance:

YouTube videos and tutorials:

Polars tutorial: Our tutorial compares Pandas and Polars libraries, helping you to understand why data engineers are switching to Polars for large datasets.
ETL Batch Pipeline with Cloud Storage, Dataflow, and BigQuery: This video showcases a complete use case of an ETL batch pipeline deployed on Google Cloud, illustrating the extraction, transformation, and loading stages into BigQuery.

GitHub Repositories:

End-to-End Data Pipeline: This repository demonstrates a fully automated pipeline that extracts data from CSV files, transforms it using Python and dbt, and loads it into Google BigQuery.
ETL Pipeline with Airflow and BigQuery: This project showcases an ETL pipeline orchestrated with Apache Airflow that automates the extraction of data from CSV files, transformation using Python, and loading into BigQuery.

Courses:

ETL and ELT in Python: Learn more about ETL processes in Python, covering foundational concepts and practical implementations to build data pipelines.
Understanding Modern Data Architecture: This course offers a comprehensive overview of modern data architecture, focusing on best practices for moving and structuring data in cloud-based systems like BigQuery.

Skills developed

Extracting data from CSV with Python.
Transforming and cleaning data with Polars or Pandas.
Working with columnar file formats like Parquet.
Loading data into BigQuery with Python and SQL.

Project 2: Weather data pipeline with Python and PostgreSQL

This project introduces aspiring data engineers to the fundamental process of building a data pipeline, focusing on three core aspects: data collection, cleansing, and storage.

Using Python, you’ll fetch weather conditions and forecasts from Open-Meteo, a completely free API that requires no API key. Once the weather data is collected, you’ll process the raw JSON, which may involve converting temperature units, handling missing values, or standardizing location names. Finally, you’ll store the cleansed data in a PostgreSQL database.

Modern Twist (Recommended): Instead of installing PostgreSQL directly on your computer, try running it in a Docker container. This keeps your computer clean and proves to employers that you understand containerization (a mandatory skill for modern data engineering).

Resources

Here are some valuable resources to help you with this specific stack:

Documentation:
- Open-Meteo Docs: The documentation is excellent and includes a URL builder so you can see the data structure before you write any code.

GitHub repositories:

Weather and Air Quality ETL Pipeline: This repository demonstrates an ETL pipeline that extracts weather and air quality data from public APIs, transforms it into a clean, analyzable format, and loads it into a PostgreSQL database.
Weather Data Integration Project: An end-to-end ETL pipeline that extracts weather data, transforms it, and loads it into a PostgreSQL database.

Courses:

Creating PostgreSQL Databases: This course offers a comprehensive guide to PostgreSQL, covering essential skills for creating, managing, and optimizing databases—a critical step in the weather data pipeline.
Data Engineer in Python: This skill track covers foundational data engineering skills, including data collection, transformation, and storage, providing a strong start for building pipelines in Python.

Skills developed

Using Python to write data pipeline applications.
Collecting data from external sources (APIs).
Docker basics (spinning up a database container).
Setting up databases and writing SQL to store data.

Project 3: London transport analysis

This project offers an excellent starting point for aspiring data engineers. It introduces you to working with real-world data from a major public transport network that handles over 1.5 million daily journeys.

The project's strength lies in its use of industry-standard data warehouse solutions like Snowflake, Amazon Redshift, Google BigQuery, or Databricks. These platforms are crucial in modern data engineering, allowing you to efficiently process and analyze large datasets.

By analyzing transport trends, popular methods, and usage patterns, you'll learn how to extract meaningful insights from large datasets - a core competency in data engineering.

Resources

Here are some resources, including guided projects and courses, that provide step-by-step guidance:

Guided projects:

Exploring London’s Travel Network: This guided project teaches you how to analyze London's public transport data, helping you explore trends, popular routes, and usage patterns. You'll gain experience with large-scale data analysis using real-world data from a major public transport network.

Courses:

Data Warehousing Concepts: This course covers essential data warehousing principles, including architectures and use cases for platforms like Snowflake, Redshift, and BigQuery. It's an excellent foundation for implementing large-scale data storage and processing solutions.

Skills developed

Understanding the context of writing queries by better understanding the data.
Working with large datasets.
Understanding big data concepts.
Working with data warehouses and big data tools, like Snowflake, Redshift, BigQuery, or Databricks.

Become a Data Engineer

Become a data engineer through advanced Python learning

Start Learning for Free

Intermediate Data Engineering Projects

These projects focus on skills like being a better programmer and mixing different data platforms. These technical skills are essential for your ability to contribute to an existing tech stack and work as part of a larger team.

Project 4: Performing a code review

This project is all about reviewing the code of another data engineer. While it may not be as hands-on with the technology as some other projects, being able to review others’ code is an important part of growing as a data engineer.

Reading and reviewing code is just as important of a skill as writing code. After understanding foundational data engineering concepts and practices, you can apply them to reviewing others’ code to ensure that it follows best practices and reduces any potential bugs in the code.

Resources

Here are some valuable resources, including projects and articles, that provide step-by-step guidance:

Guided projects:

Performing a Code Review: This guided project offers hands-on experience in code review, simulating the code review process as if you were a senior data professional. It’s an excellent way to practice identifying potential bugs and ensuring best practices are followed.

Articles:

How to Do a Code Review: This resource provides recommendations on conducting code reviews effectively, based on extensive experience, and covers various aspects of the review process.

Skills developed

Reading and evaluating code written by other data engineers
Finding bugs and logic errors when reviewing code
Providing feedback on code in a clear and helpful manner

Project 5: Building a retail data pipeline

In this project, you'll build a complete ETL pipeline with Walmart's retail data. You'll retrieve data from various sources, including SQL databases and Parquet files, apply transformation techniques to prepare and clean the data, and finally load it into an easily accessible format.

This project is excellent for building foundational yet advanced data engineering knowledge because it covers essential skills like data extraction from multiple formats, data transformation for meaningful analysis, and data loading for efficient storage and access. It helps reinforce concepts like handling diverse data sources, optimizing data flows, and maintaining scalable pipelines.

Resources

Here are some valuable resources, including guided projects and courses, that provide step-by-step guidance:

Guided projects:

Building a Retail Data Pipeline: This guided project takes you through constructing a retail data pipeline using Walmart’s retail data. You’ll learn to retrieve data from SQL databases and Parquet files, transform it for analysis, and load it into an accessible format.

Courses:

Database Design: A solid understanding of database design is essential when working on data pipelines. This course covers the basics of designing and structuring databases, which is valuable for handling diverse data sources and optimizing storage.

Skills developed

Designing data pipelines for real-world use cases.
Extracting data from multiple sources and different formats.
Cleaning and transforming data from different formats to improve its consistency and quality.
Loading this data into an easily accessible format.

Project 6: Factors influencing student performance with SQL

In this project, you'll analyze a comprehensive database focused on various factors that impact student success, such as study habits, sleep patterns, and parental involvement. By crafting SQL queries, you'll investigate the relationships between these factors and exam scores, exploring questions like the effect of extracurricular activities and sleep on academic performance.

This project builds data engineering skills by enhancing your ability to manipulate and query databases effectively.

You'll develop skills in data analysis, interpretation, and deriving insights from complex datasets, essential for making data-driven decisions in educational contexts and beyond.

Resources

Here are some resources, including guided projects and courses, that provide step-by-step guidance:

Guided projects:

Factors that Fuel Student Performance: This guided project enables you to explore the influence of various factors on student success by analyzing a comprehensive database. You’ll use SQL to investigate relationships between study habits, sleep patterns, and academic performance, gaining experience in data-driven educational analysis.

Courses:

Data Manipulation in SQL: A strong foundation in SQL data manipulation is key for this project. This course covers SQL techniques for extracting, transforming, and analyzing data in relational databases, equipping you with the skills to handle complex datasets.

Skills developed

Writing and optimizing SQL queries to retrieve and manipulate data effectively.
Analyzing complex datasets to identify trends and relationships.
Formulating hypotheses and interpreting results based on data.

Project 7: High-performance local analytics with DuckDB

While the previous project focused on writing queries, this project focuses on performance and architecture. You will use DuckDB, a modern "in-process" database, to analyze a dataset that would be too slow or heavy for standard tools like Excel or Pandas.

You will take a large public dataset (like the NYC Taxi Trip Data or Citibike Data), convert it into the industry-standard Parquet format, and run complex aggregation queries. You will learn how "Columnar Storage" allows you to query millions of rows in a fraction of a second on your own laptop, without needing to install a server.

This project is impressive to employers because it shows you are keeping up with the latest trends in the "Modern Data Stack."

Resources

Here are resources to help you build this high-performance project:

Data Sources:
- NYC Taxi & Limousine Commission: Use the "Yellow Taxi Trip Records" for a robust, real-world dataset that is perfect for testing speed.
Documentation:
- DuckDB "SQL on Parquet": Read the official guide on how to query Parquet files directly. This is the core skill of this project.

Skills developed

Understanding Columnar Storage (Parquet) vs. Row Storage (CSV).
Using DuckDB for serverless, high-speed SQL.
Benchmarking query performance.
Working with "larger-than-memory" datasets on a local machine.

Advanced Data Engineering Projects

One hallmark of an advanced data engineer is the ability to create pipelines that can handle a multitude of data types in different technologies. These projects focus on expanding your skill set by combining multiple advanced data engineering tools to create scalable data processing systems.

Project 8: Cleaning a dataset with Pyspark

Using an advanced tool like PySpark, you can build pipelines that take advantage of Apache Spark's capabilities.

Before you attempt to build a project like this, it's important to complete an introductory course to understand the fundamentals of PySpark. This foundational knowledge will enable you to fully utilize this tool for effective data extraction, transformation, and loading.

Resources

Here are some valuable resources, including guided projects, courses, and tutorials, that provide step-by-step guidance:

Guided projects:

Cleaning an Orders Dataset with PySpark: This guided project walks you through cleaning an e-commerce orders dataset using PySpark, helping you understand how to extract, transform, and load data in a scalable way with Apache Spark.

Courses:

Introduction to PySpark: This course provides an in-depth introduction to PySpark, covering essential concepts and techniques for effectively working with large datasets in Spark. It's an ideal starting point for building a strong foundation in PySpark.

Tutorials:

PySpark Tutorial: Getting Started with PySpark: This tutorial introduces the core components of PySpark, guiding you through the setup and fundamental operations so you can confidently start building data pipelines with PySpark.

Skills developed

Expanding experience with PySpark
Cleaning and transforming data for stakeholders
Ingesting large batches of data
Deepening knowledge of Python in ETL processes

Project 9: Data modeling with dbt and BigQuery

A popular and powerful modern tool for data engineers is dbt (Data Build Tool), which allows data engineers to follow a software development approach. It offers intuitive version control, testing, boilerplate code generation, lineage, and environments. dbt can be combined with BigQuery or other cloud data warehouses to store and manage your datasets.

This project will allow you to create pipelines in dbt, generate views, and link the final data to BigQuery.

Resources

Here are some valuable resources, including courses and video tutorials, that provide step-by-step guidance:

YouTube videos:

End to End Modern Data Engineering with dbt: In this video, CodeWithYu provides a comprehensive walkthrough of setting up and using dbt with BigQuery, covering the steps for building data pipelines and generating views. It’s a helpful guide for beginners learning to combine dbt and BigQuery in a data engineering workflow.

Courses:

Introduction to dbt: This course introduces the fundamentals of dbt, covering basic concepts like Git workflows, testing, and environment management. It’s an excellent starting point for using dbt effectively in data engineering projects.

Skills developed

Learn about dbt
Learn about BigQuery
Understand how to create SQL-based transformations
Use software engineering best practices in data engineering (version control, testing, and documentation)

Project 10: Airflow and Snowflake ETL using S3 storage and BI in Tableau

With this project, we’ll look at using Airflow to pull in data using an API and transfer that data into Snowflake using an Amazon S3 bucket. The purpose is to handle the ETL in Airflow and the analytical storage in Snowflake.

This is an excellent project because it connects to multiple data sources through several cloud storage systems, all orchestrated with Airflow. This project is very complete because it has many moving parts and resembles a real-world data architecture. This project also touches on business intelligence (BI) by adding visualizations in Tableau.

Resources

Here are some valuable resources, including courses and video tutorials, that provide step-by-step guidance:

YouTube videos:

Data Pipeline with Airflow, S3, and Snowflake: In this video, Seattle Data Guy demonstrates how to use Airflow to pull data from the PredictIt API, load it into Amazon S3, perform Snowflake transformations, and create Tableau visualizations. This end-to-end guide is ideal for understanding the integration of multiple tools in a data pipeline.

Courses:

Introduction to Apache Airflow in Python: This course provides an overview of Apache Airflow, covering essential concepts such as DAGs, operators, and task dependencies. It's a great foundation for understanding how to structure and manage workflows in Airflow.
Introduction to Snowflake: This course introduces Snowflake, a powerful data warehousing solution. It covers managing data storage, querying, and optimization. It’s perfect for gaining foundational knowledge before working with Snowflake in data pipelines.
Data Visualization in Tableau: This course covers essential Tableau skills for data visualization, allowing you to turn data into insightful visuals—a core step for interpreting data pipeline outputs.

Skills developed

Practice creating DAGs in Airflow
Practice connecting to an API in Python
Practice storing data in Amazon S3 buckets
Moving data from Amazon to Snowflake for analysis
Simple visualization of data in Tableau
Creating a comprehensive, end-to-end data platform

Project 11: Hacker News ETL in AWS using Airflow

This project tackles a complex data pipeline with multiple steps using advanced data processing tools in the AWS ecosystem.

Instead of dealing with restricted social media APIs, you will use the Hacker News API, which is completely free and open. You will set up Apache Airflow to extract top stories and comments, transform the data to flatten the nested JSON structures, and load it into the cloud.

The architecture follows a standard "Modern Data Stack" pattern:

Extract: Airflow triggers a Python script to fetch data from the Hacker News API.
Load: The raw JSON data is dumped into an Amazon S3 bucket (your "Data Lake").
Transform: You will use AWS Glue to crawl the data and create a schema.
Analyze: Finally, you will use Amazon Athena to run SQL queries directly on your S3 data (serverless analysis) or load it into Amazon Redshift for warehousing.

Resources

Here are some resources, including courses and video tutorials, that provide step-by-step guidance:

Documentation:

Hacker News API: The official documentation is simple and hosted on GitHub. It teaches you how to traverse the "Item IDs" to find stories and comments.

GitHub Repositories:

News Data Pipeline with Airflow & AWS: Look for repositories that demonstrate "Airflow to S3" pipelines. You can adapt these easily by simply changing the API endpoint from "NewsAPI" to "Hacker News."
dlt (Data Load Tool) Hacker News Demo: The team at dltHub has a great blog post and repo specifically about pulling Hacker News data into data warehouses. This is a great modern alternative reference.

Courses and tutorials:

Introduction to AWS: This course provides a solid foundation in AWS, covering essential concepts and tools. Understanding the basics of AWS services like S3, Glue, Athena, and Redshift will be crucial for successfully implementing this project.
AWS Glue & Athena: Look for tutorials specifically on "crawling JSON data in S3 with Glue" to understand how to turn your raw files into queryable tables.

Skills developed

Orchestration: creating complex DAGs in Airflow to manage dependencies.
API Interaction: recursively fetching nested data (comments within stories) from a public API.
Data Lake: Storing raw partition data in Amazon S3.
Serverless SQL: Using AWS Glue to catalog data and AWS Athena to query it without a database server.
Infrastructure: Managing AWS permissions (IAM) to allow Airflow to talk to S3.

Project 12: Building a real-time data pipeline with PySpark, Kafka, and Redshift

In this project, you’ll create a robust, real-time data pipeline using PySpark, Apache Kafka, and Amazon Redshift to handle high data ingestion, processing, and storage volumes.

The pipeline will capture data from various sources in real time, process and transform it using PySpark, and load the transformed data into Redshift for further analysis. Additionally, you’ll implement monitoring and alerting to ensure data accuracy and pipeline reliability.

This project is an excellent opportunity to build foundational skills in real-time data processing and handling big data technologies, such as Kafka for streaming and Redshift for cloud-based data warehousing.

Resources

Here are some resources, including courses and video tutorials, that provide step-by-step guidance:

YouTube videos:

Building a Real-Time Data Pipeline with PySpark, Kafka, and Redshift: This video by Darshir Parmar guides you through building a complete real-time data pipeline with PySpark, Kafka, and Redshift. It includes steps for data ingestion, transformation, and loading. The video also covers monitoring and alerting techniques to ensure pipeline performance.

Courses:

Introduction to Apache Kafka: This course covers the basics of Apache Kafka, a crucial component for real-time data streaming in this project. It provides an overview of Kafka’s architecture and how to implement it in data pipelines.
Streaming Concepts: This course introduces the fundamental concepts of data streaming, including real-time processing and event-driven architectures. It’s an ideal resource for gaining foundational knowledge before building real-time pipelines.

Summary Table of Data Engineering Projects

Here is a summary of the data engineering projects from above to give you a quick reference to the different projects:

Project Name	Level	Skills Developed	Tools & Technologies
1. ETL Pipeline with Open Data	Beginner	Data extraction, cleaning, and loading; Working with columnar formats; Cloud data warehousing.	Python, Polars (or Pandas), Google BigQuery, Parquet, CSV
2. Weather Data Pipeline	Beginner	API data collection; Data cleansing; Containerization basics; SQL storage.	Python, Open-Meteo API, PostgreSQL, Docker, SQL
3. London Transport Analysis	Beginner	Large-scale data analysis; Big data concepts; Query context understanding.	Snowflake, Amazon Redshift, BigQuery, or Databricks
4. Performing a Code Review	Intermediate	Code evaluation; Bug detection; Logic error identification; Peer feedback.	Code Review Tools (General), Git
5. Building a Retail Data Pipeline	Intermediate	Pipeline design; Multi-source extraction; Data consistency; Optimization.	SQL, Parquet, Python, Database Tools
6. Factors Influencing Student Performance	Intermediate	Complex SQL querying; Trend identification; Hypothesis testing; Data interpretation.	SQL (Relational Databases)
7. High-performance Local Analytics	Intermediate	Columnar vs. Row storage; Serverless SQL; Benchmarking; Local big data processing.	DuckDB, Parquet, NYC Taxi/Citibike Data
8. Cleaning a dataset with Pyspark	Advanced	Distributed computing; Large-scale data ingestion; ETL with Spark.	PySpark, Apache Spark, Python
9. Data Modeling with dbt	Advanced	Data modeling; Software engineering best practices (CI/CD, testing); SQL transformations.	dbt (Data Build Tool), Google BigQuery, Git
10. Airflow & Snowflake ETL	Advanced	DAG creation; API connection; Cloud storage integration; Business Intelligence (BI).	Apache Airflow, Amazon S3, Snowflake, Tableau, Python
11. Hacker News ETL in AWS	Advanced	Orchestration; Handling nested JSON; Data Lakes; Serverless SQL; Infrastructure management.	Apache Airflow, AWS S3, AWS Glue, AWS Athena, AWS Redshift
12. Real-time Data Pipeline	Advanced	Real-time data streaming; High-volume ingestion; Monitoring & alerting; Event-driven architecture.	PySpark, Apache Kafka, Amazon Redshift

Conclusion

This article presented excellent projects to help you practice your data engineering skills.

Focus on understanding the fundamental concepts behind how each tool works; this will enable you to use these projects in your job search and explain them successfully. Be sure to review any concepts you find challenging.

Along with building a project portfolio, I recommend taking the Professional Data Engineer in Python track and working towards obtaining a data engineering certification. This can be a valuable addition to your resume, as it demonstrates your commitment to completing relevant coursework.

Become a Data Engineer

Prove your skills as a job-ready data engineer.

Fast-Track My Data Career

What skills do I need to start working on data engineering projects?

How can data engineering projects help in building my portfolio?

Are cloud tools like AWS and Google BigQuery necessary for data engineering projects?

How do I choose the right data engineering project for my skill level?

Author

Tim Lu

Topics

Data Engineering

Learn more about data engineering with these courses!

Track

Professional Data Engineer in Python

40 hr

Dive deep into advanced skills and state-of-the-art tools revolutionizing data engineering roles today with our Professional Data Engineer track.

See Details

Start Course

Course

Introduction to Data Engineering

4 hr

124.3K

Learn about the world of data engineering in this short course, covering tools and topics like ETL and cloud computing.

See Details

Start Course

Course

Data Warehousing Concepts

4 hr

41.7K

This introductory and conceptual course will help you understand the fundamentals of data warehousing.

See Details

Start Course

blog

Practice Data Engineering Skills with New Hands-On Projects

Find out how you can practice your Data Engineering skills with DataCamp's new hands-on projects.

Alena Guzharina

3 min

blog

Top 11 Data Mining Projects to Build Your Portfolio

Explore top data mining project ideas in different industries to build your skills - from beginner to advanced. Datasets and resources to get started are included!

Kurtis Pykes

14 min

blog

10 Portfolio-Ready SQL Projects for All Levels

Select your first—or next—SQL project to practice your current SQL skills, develop new ones, and create an outstanding professional portfolio.

Elena Kosourova

11 min

blog

12 Excel Projects for Your Data Portfolio

Understand why Excel projects are important and explore 12 project ideas in this guide.

Austin Chia

10 min

blog

10 Docker Project Ideas: From Beginner to Advanced

Learn Docker with these hands-on project ideas for all skill levels, from beginner to advanced, focused on building and optimizing data science applications.

Joel Wembo

9 min

blog

10 Data Visualization Project Ideas for All Levels

Practice and improve your data visualization skills with these top projects covering a broad scope of technologies. Knowledge and experience with visualization tools are important for any data professional and improve your ability to communicate analytical findings.

Tim Lu

15 min

See More See More

Why Work on Data Engineering Projects?

Building technical skills

Portfolio development

Learning tools and technologies

Data Engineering Projects for Beginners

Project 1: ETL pipeline with open data (CSV to Parquet to BigQuery)

Resources

Skills developed

Project 2: Weather data pipeline with Python and PostgreSQL

Skills developed

Project 3: London transport analysis

Resources

Skills developed

Become a Data Engineer

Intermediate Data Engineering Projects

Project 4: Performing a code review

Resources

Skills developed

Project 5: Building a retail data pipeline

Resources

Skills developed

Project 6: Factors influencing student performance with SQL

Resources

Skills developed

Project 7: High-performance local analytics with DuckDB

Advanced Data Engineering Projects

Project 8: Cleaning a dataset with Pyspark

Resources

Skills developed

Project 9: Data modeling with dbt and BigQuery

Resources

Skills developed

Project 10: Airflow and Snowflake ETL using S3 storage and BI in Tableau

Resources

Skills developed

Project 11: Hacker News ETL in AWS using Airflow

Resources

Skills developed

Project 12: Building a real-time data pipeline with PySpark, Kafka, and Redshift

Resources

Summary Table of Data Engineering Projects

Conclusion

Become a Data Engineer

FAQs

Are cloud tools like AWS and Google BigQuery necessary for data engineering projects?

How do I choose the right data engineering project for my skill level?

Practice Data Engineering Skills with New Hands-On Projects

Top 11 Data Mining Projects to Build Your Portfolio

10 Portfolio-Ready SQL Projects for All Levels

12 Excel Projects for Your Data Portfolio

10 Docker Project Ideas: From Beginner to Advanced

10 Data Visualization Project Ideas for All Levels

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Professional Data Engineer in Python

Introduction to Data Engineering

Data Warehousing Concepts

Practice Data Engineering Skills with New Hands-On Projects

Top 11 Data Mining Projects to Build Your Portfolio

10 Portfolio-Ready SQL Projects for All Levels

12 Excel Projects for Your Data Portfolio

10 Docker Project Ideas: From Beginner to Advanced

10 Data Visualization Project Ideas for All Levels

Professional Data Engineer in Python