ข้ามไปยังเนื้อหาหลัก

บ้าน Databricks

Courses

Data Transformation with Spark SQL in Databricks

ระดับกลางระดับทักษะ

อัปเดตแล้ว 05/2569

Build end-to-end data pipelines - from cleaning and aggregation to streaming and orchestration.

เริ่มเรียนคอร์สฟรี

DatabricksData Engineering

3 ชม.

7 videos

25 Exercises

1,750 เอ็กซ์พี

ใบรับรองความสำเร็จ

สร้างบัญชีฟรีของคุณ

ดำเนินการต่อด้วย Google แสดงตัวเลือกเพิ่มเติม

หรือ

เมื่อดำเนินการต่อ คุณยอมรับ ข้อกำหนดการใช้งาน ของเรา นโยบายความเป็นส่วนตัว ของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บในสหรัฐอเมริกา

เป็นที่ชื่นชอบของผู้เรียนในบริษัทนับพันแห่ง

ฝึกอบรมทีมอยู่หรือ?

ลองใช้สำหรับธุรกิจ

คำอธิบายหลักสูตร

Ready to handle real-world data at scale? This course teaches you to transform large datasets using Spark SQL and PySpark in Databricks. Learn to shape and clean data, run aggregations with optimized joins, and apply window functions for advanced analytics. You'll also set up file-based streaming with fault-tolerant checkpoints and persist results as Delta tables. By the end, you'll be orchestrating multi-step production pipelines with Databricks Workflows and Lakeflow Declarative Pipelines.

ข้อกำหนดเบื้องต้น

Introduction to Databricks SQL Introduction to PySpark

1

Loading and Shaping Data

In this chapter, you'll learn how to work with Databricks notebooks, load CSV data into Spark DataFrames, and shape data using PySpark and SQL.

Working with Databricks notebooks

50 เอ็กซ์พี

Understanding Databricks notebooks

50 เอ็กซ์พี

Loading your first dataset

100 เอ็กซ์พี

Exploring driver logs

100 เอ็กซ์พี

Shaping data with PySpark and SQL

50 เอ็กซ์พี

Using PySpark to shape data

100 เอ็กซ์พี

Analyzing data with SQL

100 เอ็กซ์พี

Understanding temporary views

50 เอ็กซ์พี

เริ่มบทเรียน

2

Data Cleaning and Optimization

Learn how to define explicit schemas, build a data cleaning pipeline, and optimize query performance with broadcast joins.

Data cleaning and quality checks

50 เอ็กซ์พี

Why explicit schemas matter

50 เอ็กซ์พี

Cleaning the online retail dataset

100 เอ็กซ์พี

Choosing the right quality metric

50 เอ็กซ์พี

Aggregating and joining data efficiently

50 เอ็กซ์พี

Joining and aggregating retail data

100 เอ็กซ์พี

Understanding the shuffle bottleneck

50 เอ็กซ์พี

When to use a broadcast join

50 เอ็กซ์พี

เริ่มบทเรียน

3

Analytics and Production Pipelines

Learn how to calculate running totals and rankings with window functions, build streaming pipelines, and deploy production workflows.

Window functions and streaming queries

50 เอ็กซ์พี

Ranking customers with window functions

100 เอ็กซ์พี

Streaming retail data into Delta Lake

100 เอ็กซ์พี

Resuming after a restart

50 เอ็กซ์พี

Production pipelines with workflows

50 เอ็กซ์พี

Writing and reading a Delta table

100 เอ็กซ์พี

Building a multi-task job pipeline

100 เอ็กซ์พี

Why switch to Lakeflow?

50 เอ็กซ์พี

Wrapping up

50 เอ็กซ์พี

เริ่มบทเรียน

Data Transformation with Spark SQL in Databricks

เรียนจบคอร์สแล้ว

รับใบรับรองความสำเร็จ

เพิ่มข้อมูลรับรองนี้ลงในโปรไฟล์ LinkedIn, เรซูเม่ หรือ CV ของคุณ
แชร์บนโซเชียลมีเดียและในการประเมินผลงานของคุณสมัครเลย

สำหรับธุรกิจ

ฝึกอบรม 2 คนขึ้นไป?

รับสิทธิ์การเข้าถึงแพลตฟอร์ม DataCamp แบบเต็มรูปแบบสำหรับทีมของคุณ รวมถึงฟีเจอร์ทั้งหมด

ในแทร็กต่อไปนี้

วิศวกรข้อมูลระดับต้น ใน Databricks

instructors

Disha Mukherjee

Disha Mukherjee

Lead Data Engineer & Data Evangelist

collaborators

Courses resources

online_retaildatasets

transactionsdatasets

country_lookupdatasets

เข้าร่วมกับผู้เรียนกว่า 19 ล้านคน และเริ่มData Transformation with Spark SQL in Databricksวันนี้!

สร้างบัญชีฟรีของคุณ

ดำเนินการต่อด้วย Google แสดงตัวเลือกเพิ่มเติม

หรือ

เมื่อดำเนินการต่อ คุณยอมรับ ข้อกำหนดการใช้งาน ของเรา นโยบายความเป็นส่วนตัว ของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บในสหรัฐอเมริกา

พัฒนาทักษะด้านข้อมูลของคุณด้วย DataCamp for Mobile

พัฒนาทักษะได้ทุกที่ทุกเวลาด้วยคอร์สเรียนบนมือถือและแบบฝึกหัดเขียนโค้ดประจำวัน 5 นาทีของเรา