メインコンテンツへスキップ

ホーム Databricks

無料コース

Data Transformation with Spark SQL in Databricks

中級スキルレベル

更新済み 2026/05

Build end-to-end data pipelines - from cleaning and aggregation to streaming and orchestration.

無料コースを始める

無料で含まれています

DatabricksData Engineering

3 時間

7 ビデオ

25 演習

1,750 XP

修了証明書

数千社の学習者に愛されています

チームをトレーニングしますか？

法人向けに試す

コースの説明

Ready to handle real-world data at scale? This course teaches you to transform large datasets using Spark SQL and PySpark in Databricks. Learn to shape and clean data, run aggregations with optimized joins, and apply window functions for advanced analytics. You'll also set up file-based streaming with fault-tolerant checkpoints and persist results as Delta tables. By the end, you'll be orchestrating multi-step production pipelines with Databricks Workflows and Lakeflow Declarative Pipelines.

前提条件

Introduction to Databricks SQL Introduction to PySpark

1

Loading and Shaping Data

In this chapter, you'll learn how to work with Databricks notebooks, load CSV data into Spark DataFrames, and shape data using PySpark and SQL.

Working with Databricks notebooks

Understanding Databricks notebooks

Loading your first dataset

Exploring driver logs

Shaping data with PySpark and SQL

Using PySpark to shape data

Analyzing data with SQL

Understanding temporary views

チャプターを開始

2

Data Cleaning and Optimization

Learn how to define explicit schemas, build a data cleaning pipeline, and optimize query performance with broadcast joins.

Data cleaning and quality checks

Why explicit schemas matter

Cleaning the online retail dataset

Choosing the right quality metric

Aggregating and joining data efficiently

Joining and aggregating retail data

Understanding the shuffle bottleneck

When to use a broadcast join

チャプターを開始

3

Analytics and Production Pipelines

Learn how to calculate running totals and rankings with window functions, build streaming pipelines, and deploy production workflows.

Window functions and streaming queries

Ranking customers with window functions

Streaming retail data into Delta Lake

Resuming after a restart

Production pipelines with workflows

Writing and reading a Delta table

Building a multi-task job pipeline

Why switch to Lakeflow?

Wrapping up

チャプターを開始

Data Transformation with Spark SQL in Databricks

コース完了

修了証明書を取得する

この資格をLinkedInプロフィール、履歴書、またはCVに追加する
SNSで共有し、評価面談でも活用しましょう今すぐ登録

19百万人の学習者に加わって、今日からData Transformation with Spark SQL in Databricksを始めましょう！

DataCamp for Mobileでデータスキルを磨きましょう

モバイルコースと毎日の 5 分間のコーディングチャレンジで、外出先でも進歩できます。