ข้ามไปยังเนื้อหาหลัก

หน้าหลัก Spark

คอร์ส

Feature Engineering with PySpark

ขั้นสูงระดับทักษะ

อัปเดตแล้ว 01/2569

Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.

เริ่มคอร์สฟรี

SparkData Manipulation

4 ชม.

16 วิดีโอ

60 แบบฝึกหัด

5,000 XP

17,764

ใบรับรองความสำเร็จ

สร้างบัญชีฟรีของคุณ

ดำเนินการต่อด้วย Google แสดงตัวเลือกเพิ่มเติม

หรือ

เมื่อดำเนินการต่อ คุณยอมรับ ข้อกำหนดการใช้งาน ของเรา นโยบายความเป็นส่วนตัว ของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บในสหรัฐอเมริกา

เป็นที่รักของผู้เรียนในบริษัทหลายพันแห่ง

กำลังฝึกอบรมทีม?

ลองใช้สำหรับธุรกิจ

คำอธิบายคอร์ส

The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!

ข้อกำหนดเบื้องต้น

Supervised Learning with scikit-learn Introduction to PySpark

1

Exploratory Data Analysis

Get to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!

Where to Begin

Where to begin?

Check Version

Load in the data

Defining A Problem

What are we predicting?

Verifying Data Load

Verifying DataTypes

Visually Inspecting Data / EDA

Using Corr()

Using Visualizations: distplot

Using Visualizations: lmplot

เริ่มบท

2

Wrangling with Spark Functions

Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.

Dropping data

Dropping a list of columns

Using text filters to remove records

Filtering numeric fields conditionally

Adjusting Data

Custom Percentage Scaling

Scaling your scalers

Correcting Right Skew Data

Working with Missing Data

Visualizing Missing Data

Imputing Missing Data

Calculate Missing Percents

Getting More Data

A Dangerous Join

Spark SQL Join

Checking for Bad Joins

เริ่มบท

3

Feature Engineering

In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.

Feature Generation

Differences

Deeper Features

Time Features

Time Components

Joining On Time Components

Extracting Features

Extracting Text to New Features

Splitting & Exploding

Pivot & Join

Binarizing, Bucketing & Encoding

Binarizing Day of Week

One Hot Encoding

เริ่มบท

4

Building a Model

In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!

Choosing the Algorithm

Which MLlib Module?

Creating Time Splits

Adjusting Time Features

Feature Engineering Assumptions for RFR

Feature Engineering For Random Forests

Dropping Columns with Low Observations

Naively Handling Missing and Categorical Values

Building a Model

Building a Regression Model

Evaluating & Comparing Algorithms

Understanding Metrics

Interpreting, Saving & Loading

Interpreting Results

Saving & Loading Models

Final Thoughts

เริ่มบท

Feature Engineering with PySpark

คอร์สเสร็จสมบูรณ์

รับใบรับรองความสำเร็จ

เพิ่มใบรับรองนี้ไปยังโปรไฟล์ LinkedIn เรซูเม่ หรือ CV ของคุณ
แชร์บน social media และในการรีวิวผลการปฏิบัติงานของคุณลงทะเบียนทันที

ร่วมกับผู้เรียนกว่า 19 ล้านคนและเริ่มต้น Feature Engineering with PySpark วันนี้!

สร้างบัญชีฟรีของคุณ

ดำเนินการต่อด้วย Google แสดงตัวเลือกเพิ่มเติม

หรือ

เมื่อดำเนินการต่อ คุณยอมรับ ข้อกำหนดการใช้งาน ของเรา นโยบายความเป็นส่วนตัว ของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บในสหรัฐอเมริกา

พัฒนาทักษะด้านข้อมูลของคุณด้วย DataCamp for Mobile

พัฒนาทักษะได้ทุกที่ทุกเวลาด้วยคอร์สเรียนบนมือถือและแบบฝึกหัดเขียนโค้ดประจำวัน 5 นาทีของเรา