Data Engineering Tool Kits Practice and apply your skills Prerequisite Introduction to Data Engineering Essential Data Engineering Advanced / Further Concepts Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.
Learn the art of writing your own functions in Python, as well as key concepts like scoping and error handling.
Continue to build your modern Data Science skills by learning about iterators and list comprehensions.
Learn how to create and query relational databases using SQL in just two hours.
Learn about the world of data engineering in this short course, covering tools and topics like ETL and cloud computing.
Learn how to create one of the most efficient ways of storing data - relational databases!
Accompanied at every step with hands-on practice queries, this course teaches you everything you need to know to analyze data using your own SQL code today!
Level up your SQL knowledge and learn to join tables together, apply relational set theory, and work with subqueries.
Learn to design databases in SQL to process, store, and organize data in a more efficient way.
In this conceptual course (no coding required), you will learn about the four major NoSQL databases and popular engines.
The Unix command line helps users combine existing programs in new ways, automate repetitive tasks, and run programs on clusters and clouds.
Learn to implement distributed data management and machine learning in Spark using the PySpark package.
Learn how to manipulate data and create machine learning feature sets in Spark using SQL in Python.
This assessment might cover the following topics: Assess data quality and perform validation tasks using SQL; Perform standard cleaning tasks to prepare data for analysis using SQL; Perform standard data joining tasks using SQL; Perform standard data extraction and aggregation tasks using SQL.
This assessment might cover the following topics: Performing common tidying operations such as moving between wide and long format data; Working with a range of file types to import into a suitable format; Working with multiple data sets to combine and create a relevant single data structure; Performing cleaning and manipulation operations on strings, dates and times; Accessing data through APIs and scraping techniques; Identifying and resolving common data issues such as incorrect categories and missing data issues.
Level up your data science skills by creating visualizations using Matplotlib and manipulating DataFrames with pandas.
Learn to write efficient code that executes quickly and allocates resources skillfully to avoid unnecessary overhead.
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
Reshape DataFrames from a wide to long format, stack and unstack rows and columns, and wrangle multi-index DataFrames.
Learn efficient techniques in pandas to optimize your Python code.
In this course, students will learn to write queries that are both efficient and easy to read and understand.
Learn the most important PostgreSQL functions for manipulating, processing, and transforming data.
Learn how to create a PostgreSQL database and explore the structure, data types, and how to normalize databases.
Learn to manipulate and analyze flexibly structured data with MongoDB.
Explore the basics of data quality management. Learn the key concepts, dimensions, and techniques for monitoring and improving data quality.
Learn powerful command-line skills to download, process, and transform data, including machine learning pipeline.
Learn the fundamentals of working with big data with PySpark.
Learn how to clean data with Apache Spark in Python.
Learn how to implement and schedule data engineering workflows.
Learn how to build and test data engineering pipelines in Python using PySpark and Apache Airflow.
A non-coding introduction to cloud computing, covering key concepts, terminology, and tools.
Learn about the difference between batching and streaming, scaling streaming systems, and real-world applications.
Learn how to work with streaming data using serverless technologies on AWS.
In this Introduction to DevOps, you’ll master the DevOps basics and learn the key concepts, tools, and techniques to improve productivity.
Familiarize yourself with Git for version control. Explore how to track, compare, modify, and revert files, as well as collaborate with colleagues using Git.
Gain an introduction to Docker and discover its importance in the data professional’s toolkit. Learn about Docker containers, images, and more.
Discover how MLOps can take machine learning models from local notebooks to functioning models in production that generate real business value.
Learn about MLOps, including the tools and practices needed for automating and scaling machine learning applications.
Learn to build pipelines that stand the test of time.
Learn how to make predictions from data with Apache Spark, using decision trees, logistic regression, linear regression, ensembles, and pipelines.
An introduction to data visualization with no coding involved.
Learn how to create, customize, and share data visualizations using Matplotlib.
Learn how to create informative and attractive visualizations in Python using the Seaborn library.
Begin your journey with Scala, a popular language for scalable applications and data engineering infrastructure.
Our full library contains 50+ learning tracks, 400+ interactive courses, 100+ training projects, and other material.
What is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials and Google Spreadsheets to Python, R, and SQL.
Hands-on learning experience
Grow your data skills with short video tutorials and hands-on coding exercises.
Choose your own learning path
Choose from over 350 courses or enroll in IoA's custom learning paths for Business People, Practitioners, Thought Leaders, and C-Suite, to overcome your biggest business and technology challenges.
Flexible online training for every role
Grow your skills with data-oriented assessments, courses, projects, and practice exercises in Python, R, SQL, Excel, Python, Tableau, Oracle, Power BI, data engineering, and more.
Ready to learn? Visit My IoA to unlock your DataCamp access