본문으로 바로가기

강의

R에서의 차원 축소

기초기술 수준

업데이트됨 2024. 12.

R에서 차원 축소 기법을 배우고, 데이터와 모델에 맞는 특징 선택·추출을 마스터하세요.

무료로 강의 시작

RMachine Learning4시간16 동영상56 연습 문제4,600 XP2,696성취 증명서

무료 계정을 만드세요

또는

계속 진행하시면 당사의 이용약관, 개인정보처리방침 및 귀하의 데이터가 미국에 저장되는 것에 동의하시는 것입니다.

수천 개 기업의 학습자들이 사랑하는

2명 이상을 교육하시나요?

DataCamp for Business 체험

강의 설명

특징(피처)이 너무 많은 데이터셋을 다뤄 보신 적이 있나요? 이 강의에서는 예측 성능을 잘 유지하면서 데이터와 모델을 더 단순하게 만드는 차원 축소 기법을 배웁니다. 차원 축소는 데이터 과학에서 오컴의 면도날과도 같습니다. R을 사용해 중요하지 않은 피처를 식별하고 제거하는 방법, 여러 피처를 최대한의 정보를 담은 압축된 구성요소로 추출하는 방법, 그리고 실제 데이터를 사용해 성능을 크게 떨어뜨리지 않고 더 적은 피처로 모델을 만드는 방법을 익히게 됩니다.

선수 조건

Modeling with tidymodels in R

1

Foundations of Dimensionality Reduction

Prepare to simplify large data sets! You will learn about information, how to assess feature importance, and practice identifying low-information features. By the end of the chapter, you will understand the difference between feature selection and feature extraction—the two approaches to dimensionality reduction.

Introduction to dimensionality reduction

Dimensionality and feature information

Mutual information features

Information and feature importance

Calculating root entropy

Calculating child entropies

Calculating information gain of color

The Importance of Dimensionality Reduction in Data and Model Building

Calculate possible combinations

Curse of dimensionality, overfitting, and bias

2

Feature Selection for Feature Importance

Learn how to identify information-rich and information-poor features missing value ratios, variance, and correlation. Then you'll discover how to build tidymodel recipes to select features using these information indicators.

Feature selection vs. feature extraction

Create a zero-variance filter

Create a missing values filter

Feature selection with the combined filter

Selecting based on missing values

Create a missing value ratio filter

Apply a missing value ratio filter

Create a missing values recipe

Selecting based on variance

Create a low-variance filter

Create a low-variance recipe

Selecting based on correlation with other features

Identify highly correlated features

Select correlated feature to remove

Create a high-correlation recipe

3

Feature Selection for Model Performance

Chapter three introduces the difference between unsupervised and supervised feature selection approaches. You'll review how to use tidymodels workflows to build models. Then, you'll perform supervised feature selection using lasso regression and random forest models.

Supervised feature selection

Supervised vs. unsupervised feature selection

Decision tree feature selection type

Model Building and Evaluation with tidymodels

Split out the train and test sets

Create a recipe-model workflow

Fit, explore, and evaluate the model

Lasso Regression

Scale the data for lasso regression

Explore lasso regression penalty values

Tune the penalty hyperparameter

Fit the best model

Random forest models

Create full random forest model

Reduce data using feature importances

Create reduced random forest

4

Feature Extraction and Model Performance

In this final chapter, you'll gain a strong intuition of feature extraction by understanding how principal components extract and combine the most important information from different features. Then learn about and apply three types of feature extraction — principal component analysis (PCA), t-SNE, and UMAP. Discover how you can use these feature extraction methods as a preprocessing step in the tidymodels model-building process.

Foundations of feature extraction - principal components

Understanding principal components

Naming principal components

Principal Component Analysis (PCA)

PCA: variance explained

Mapping features to principal components

PCA in tidymodels

t-Distributed Stochastic Neighborhood Embedding (t-SNE)

Separating house prices with PCA

Separating house prices with t-SNE

Uniform Manifold Approximation and Projection (UMAP)

Separating house prices with UMAP

UMAP reduction in a decision tree model

Evaluate the UMAP decision tree model

R에서의 차원 축소

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 자격증을 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 R에서의 차원 축소을(를) 시작하세요!

무료 계정을 만드세요

또는

계속 진행하시면 당사의 이용약관, 개인정보처리방침 및 귀하의 데이터가 미국에 저장되는 것에 동의하시는 것입니다.

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.