跳至内容
首页Spark

课程

Foundations of PySpark

中级技能水平
更新时间 2025年3月
Learn to implement distributed data management and machine learning in Spark using the PySpark package.
免费开始课程
SparkData Engineering
4小时
45 道练习
3,850 XP
150K+
成就证明

创建您的免费帐户

继续使用 Google显示更多选项


继续操作即表示您接受我们的《使用条款》和《隐私政策》,并同意您的数据存储在美国。

深受数千家公司学习者的喜爱

Group

需要团队培训?

企业版试用

课程描述

In this course, you'll learn how to use Spark from Python! Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that makes the magic happen. You'll use this package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be delayed. Get ready to put some Spark in your Python code and dive into the world of high-performance machine learning!

先决条件

Introduction to Python
1

Getting to know PySpark

In this chapter, you'll learn how Spark manages data and how can you read and write tables from Python.
开始章节
2

Manipulating data

In this chapter, you'll learn about the pyspark.sql module, which provides optimized data queries to your Spark session.
开始章节
3

Getting started with machine learning pipelines

Foundations of PySpark
课程完成

获得成就证明

将此证书添加到您的 LinkedIn 档案、简历或履历中
在社交媒体和绩效评估中分享
立即注册

加入超过19百万学习者,今天就开始Foundations of PySpark!

创建您的免费帐户

继续使用 Google显示更多选项


继续操作即表示您接受我们的《使用条款》和《隐私政策》,并同意您的数据存储在美国。

通过 DataCamp for Mobile 提升您的数据技能

随时随地通过我们的移动课程和每日 5 分钟编程挑战提升技能。