Chuyển đến nội dung chính

Trang chủ Python

Khóa học

Giảm Chiều Dữ Liệu với Python

Trung cấpTrình độ kỹ năng

Đã cập nhật tháng 01, 2023

Hiểu rõ khái niệm giảm chiều trong dữ liệu của bạn và thành thạo các kỹ thuật để thực hiện điều đó trong Python.

Bắt Đầu Khóa Học Miễn Phí

PythonMachine Learning4 giờ16 video58 Bài tập4,700 XP36,095Giấy Chứng Nhận Thành Tích

Tạo tài khoản miễn phí

hoặc

Bằng cách tiếp tục, bạn chấp nhận Điều khoản sử dụng, Chính sách bảo mật và việc dữ liệu của bạn được lưu trữ tại Hoa Kỳ.

Được yêu thích bởi học viên tại hàng nghìn công ty

Đào tạo 2 người trở lên?

Thử DataCamp for Business

Mô tả khóa học

Các bộ dữ liệu có số chiều lớn có thể khiến bạn choáng ngợp và không biết bắt đầu từ đâu. Thông thường, bạn sẽ khám phá trực quan một bộ dữ liệu mới trước, nhưng khi có quá nhiều chiều, các cách tiếp cận cổ điển sẽ tỏ ra không đủ. May mắn là có những kỹ thuật trực quan hóa được thiết kế riêng cho dữ liệu nhiều chiều và bạn sẽ được giới thiệu trong khóa học này. Sau khi khám phá dữ liệu, bạn thường sẽ thấy nhiều đặc trưng gần như không chứa thông tin vì chúng không có độ biến thiên hoặc trùng lặp với đặc trưng khác. Bạn sẽ học cách phát hiện và loại bỏ các đặc trưng này khỏi bộ dữ liệu để tập trung vào những đặc trưng giàu thông tin. Ở bước tiếp theo, bạn có thể muốn xây dựng một mô hình dựa trên các đặc trưng này, và có thể hóa ra một số đặc trưng không ảnh hưởng gì đến biến bạn đang cố dự đoán. Bạn cũng sẽ học cách phát hiện và loại bỏ những đặc trưng không liên quan này để giảm số chiều và do đó giảm độ phức tạp. Cuối cùng, bạn sẽ học cách các kỹ thuật trích xuất đặc trưng có thể giúp giảm chiều cho bạn thông qua việc tính toán các thành phần chính không tương quan.

Điều kiện tiên quyết

Supervised Learning with scikit-learn

1

Exploring High Dimensional Data

You'll be introduced to the concept of dimensionality reduction and will learn when an why this is important. You'll learn the difference between feature selection and feature extraction and will apply both techniques for data exploration. The chapter ends with a lesson on t-SNE, a powerful feature extraction technique that will allow you to visualize a high-dimensional dataset.

Introduction

Finding the number of dimensions in a dataset

Removing features without variance

Feature selection vs. feature extraction

Visually detecting redundant features

Advantage of feature selection

t-SNE visualization of high-dimensional data

t-SNE intuition

Fitting t-SNE to the ANSUR data

t-SNE visualisation of dimensionality

Bắt Đầu Chương

2

Feature Selection I - Selecting for Feature Information

In this first out of two chapters on feature selection, you'll learn about the curse of dimensionality and how dimensionality reduction can help you overcome it. You'll be introduced to a number of techniques to detect and remove features that bring little added value to the dataset. Either because they have little variance, too many missing values, or because they are strongly correlated to other features.

The curse of dimensionality

Train - test split

Fitting and testing the model

Accuracy after dimensionality reduction

Features with missing values or little variance

Finding a good variance threshold

Features with low variance

Removing features with many missing values

Pairwise correlation

Correlation intuition

Inspecting the correlation matrix

Visualizing the correlation matrix

Removing highly correlated features

Filtering out highly correlated features

Nuclear energy and pool drownings

Bắt Đầu Chương

3

Feature Selection II - Selecting for Model Accuracy

In this second chapter on feature selection, you'll learn how to let models help you find the most important features in a dataset for predicting a particular target feature. In the final lesson of this chapter, you'll combine the advice of multiple, different, models to decide on which features are worth keeping.

Selecting features for model performance

Building a diabetes classifier

Manual Recursive Feature Elimination

Automatic Recursive Feature Elimination

Tree-based feature selection

Building a random forest model

Random forest for feature selection

Recursive Feature Elimination with random forests

Regularized linear regression

Creating a LASSO regressor

Lasso model results

Adjusting the regularization strength

Combining feature selectors

Creating a LassoCV regressor

Ensemble models for extra votes

Combining 3 feature selectors

Bắt Đầu Chương

4

Feature Extraction

This chapter is a deep-dive on the most frequently used dimensionality reduction algorithm, Principal Component Analysis (PCA). You'll build intuition on how and why this algorithm is so powerful and will apply it both for data exploration and data pre-processing in a modeling pipeline. You'll end with a cool image compression use case.

Feature extraction

Manual feature extraction I

Manual feature extraction II

Principal component intuition

Principal component analysis

Calculating Principal Components

PCA on a larger dataset

PCA explained variance

PCA applications

Understanding the components

PCA for feature exploration

PCA in a model pipeline

Principal Component selection

Selecting the proportion of variance to keep

Choosing the number of components

PCA for image compression

Congratulations!

Bắt Đầu Chương

Giảm Chiều Dữ Liệu với Python

Hoàn
Thành

Nhận Giấy Chứng Nhận Hoàn Thành

Thêm chứng chỉ này vào hồ sơ LinkedIn, CV hoặc sơ yếu lý lịch của ban
Chia sẻ trên mạng xã hội và trong đánh giá hiệu suất của banĐăng Ký Ngay

Tham gia cùng hơn 19 triệu học viên và bắt đầu Giảm Chiều Dữ Liệu với Python ngay hôm nay!

Tạo tài khoản miễn phí

hoặc

Bằng cách tiếp tục, bạn chấp nhận Điều khoản sử dụng, Chính sách bảo mật và việc dữ liệu của bạn được lưu trữ tại Hoa Kỳ.

Phát triển kỹ năng dữ liệu với DataCamp cho thiết bị di động

Tiến bộ mọi lúc mọi nơi với các khóa học cho thiết bị di động và thử thách lập trình 5 phút hằng ngày.