본문으로 바로가기

강의

Python으로 배우는 차원 축소

중급기술 수준

업데이트됨 2023. 1.

데이터의 차원 축소 개념을 이해하고, Python을 사용해 이를 수행하는 기술을 숙달하세요.

무료로 강의 시작

PythonMachine Learning

4시간

16 동영상

58 연습 문제

4,700 XP

36,415

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

고차원 데이터셋은 부담스럽게 느껴질 수 있고, 어디서 시작해야 할지 막막할 때가 많습니다. 보통은 새 데이터셋을 먼저 시각적으로 탐색하지만, 차원이 너무 많으면 기존 방법만으로는 충분하지 않아 보일 수 있죠. 다행히 고차원 데이터에 특화된 시각화 기법들이 있으며, 이 강의에서 이를 소개합니다. 데이터를 탐색하다 보면, 분산이 거의 없거나 다른 특성과 중복되어 많은 정보를 담지 못하는 특성들이 종종 발견됩니다. 이런 특성을 찾아내어 데이터셋에서 제거하고, 의미 있는 특성에 집중하는 방법을 배웁니다. 다음 단계로, 이런 특성들로 모델을 만들다 보면 예측하려는 대상에 전혀 영향을 주지 않는 특성이 있을 수도 있습니다. 차원을 줄이고 복잡성을 낮추기 위해, 이러한 무관한 특성도 찾아서 제거하는 방법을 학습합니다. 마지막으로, 상관성이 없는 주성분을 계산해 차원을 줄여 주는 특성 추출 기법도 익히게 됩니다.

선수 조건

Supervised Learning with scikit-learn

1

Exploring High Dimensional Data

You'll be introduced to the concept of dimensionality reduction and will learn when an why this is important. You'll learn the difference between feature selection and feature extraction and will apply both techniques for data exploration. The chapter ends with a lesson on t-SNE, a powerful feature extraction technique that will allow you to visualize a high-dimensional dataset.

Introduction

Finding the number of dimensions in a dataset

Removing features without variance

Feature selection vs. feature extraction

Visually detecting redundant features

Advantage of feature selection

t-SNE visualization of high-dimensional data

t-SNE intuition

Fitting t-SNE to the ANSUR data

t-SNE visualisation of dimensionality

2

Feature Selection I - Selecting for Feature Information

In this first out of two chapters on feature selection, you'll learn about the curse of dimensionality and how dimensionality reduction can help you overcome it. You'll be introduced to a number of techniques to detect and remove features that bring little added value to the dataset. Either because they have little variance, too many missing values, or because they are strongly correlated to other features.

The curse of dimensionality

Train - test split

Fitting and testing the model

Accuracy after dimensionality reduction

Features with missing values or little variance

Finding a good variance threshold

Features with low variance

Removing features with many missing values

Pairwise correlation

Correlation intuition

Inspecting the correlation matrix

Visualizing the correlation matrix

Removing highly correlated features

Filtering out highly correlated features

Nuclear energy and pool drownings

3

Feature Selection II - Selecting for Model Accuracy

In this second chapter on feature selection, you'll learn how to let models help you find the most important features in a dataset for predicting a particular target feature. In the final lesson of this chapter, you'll combine the advice of multiple, different, models to decide on which features are worth keeping.

Selecting features for model performance

Building a diabetes classifier

Manual Recursive Feature Elimination

Automatic Recursive Feature Elimination

Tree-based feature selection

Building a random forest model

Random forest for feature selection

Recursive Feature Elimination with random forests

Regularized linear regression

Creating a LASSO regressor

Lasso model results

Adjusting the regularization strength

Combining feature selectors

Creating a LassoCV regressor

Ensemble models for extra votes

Combining 3 feature selectors

4

Feature Extraction

This chapter is a deep-dive on the most frequently used dimensionality reduction algorithm, Principal Component Analysis (PCA). You'll build intuition on how and why this algorithm is so powerful and will apply it both for data exploration and data pre-processing in a modeling pipeline. You'll end with a cool image compression use case.

Feature extraction

Manual feature extraction I

Manual feature extraction II

Principal component intuition

Principal component analysis

Calculating Principal Components

PCA on a larger dataset

PCA explained variance

PCA applications

Understanding the components

PCA for feature exploration

PCA in a model pipeline

Principal Component selection

Selecting the proportion of variance to keep

Choosing the number of components

PCA for image compression

Congratulations!

Python으로 배우는 차원 축소

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 Python으로 배우는 차원 축소을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.