본문으로 바로가기

강의

R로 배우는 트리 기반 Machine Learning

기초기술 수준

업데이트됨 2023. 8.

tidymodels로 트리 기반 모델과 앙상블을 활용해 분류·회귀 예측 방법을 학습합니다.

무료로 강의 시작

RMachine Learning

4시간

16 동영상

58 연습 문제

4,850 XP

10,639

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

트리 기반 Machine Learning 모델은 데이터의 복잡한 비선형 관계를 드러내며, Machine Learning 대회에서 자주 두각을 나타내요. 이 강의에서는 tidymodels 패키지를 사용해 단순한 의사결정나무부터 복잡한 랜덤 포레스트까지 다양한 트리 기반 모델을 탐색하고 구축해 봅니다. 또한 앙상블 학습을 활용해 예측 성능을 높이는 강력한 기법인 부스팅 트리도 배웁니다. 과정 전반에 걸쳐 당뇨병 발생과 고객 이탈을 예측하기 위해 건강 및 신용 위험 데이터를 다뤄 볼 거예요.

선수 조건

Modeling with tidymodels in R

1

Classification Trees

Ready to build a real machine learning pipeline? Complete step-by-step exercises to learn how to create decision trees, split your data, and predict which patients are most likely to suffer from diabetes. Last but not least, you’ll build performance measures to assess your models and judge your predictions.

Welcome to the course!

Why tree-based methods?

Specify that tree

Train that model

How to grow your tree

Train/test split

Avoiding class imbalances

From zero to hero

Predict and evaluate

Make predictions

Crack the matrix

Are you predicting correctly?

2

Regression Trees and Cross-Validation

Ready for some candy? Use a chocolate rating dataset to build regression trees and assess their performance using suitable error measures. You’ll overcome statistical insecurities of single train/test splits by applying sweet techniques like cross-validation and then dive even deeper by mastering the bias-variance tradeoff.

Continuous outcomes

Train a regression tree

Predict new values

Inspect model output

Performance metrics for regression trees

In-sample performance

Out-of-sample performance

Bigger mistakes, bigger penalty

Cross-validation

Create the folds

Fit the folds

Evaluate the folds

Bias-variance tradeoff

Call things by their names

Adjust model complexity

In-sample and out-of-sample performance

3

Hyperparameters and Ensemble Models

Time to get serious with tuning your hyperparameters and interpreting receiver operating characteristic (ROC) curves. In this chapter, you’ll leverage the wisdom of the crowd with ensemble models like bagging or random forests and build ensembles that forecast which credit card customers are most likely to churn.

Tuning hyperparameters

Generate a tuning grid

Tune along the grid

Pick the winner

More model measures

Calculate specificity

Draw the ROC curve

Area under the ROC curve

Bagged trees

Create bagged trees

In-sample ROC and AUC

Check for overfitting

Random forest

Bagged trees vs. random forest

Variable importance

4

Boosted Trees

Ready for the high society of tree-based models? Apply gradient boosting to create powerful ensembles that perform better than anything that you have seen or built. Learn about their fine-tuning and how to compare different models to pick a winner for production.

Introduction to boosting

Bagging vs. boosting

Specify a boosted ensemble

Gradient boosting

Train a boosted ensemble

Evaluate the ensemble

Compare to a single classifier

Optimize the boosted ensemble

Tuning preparation

The actual tuning

Finalize the model

Model comparison

Compare AUC

Plot ROC curves

R로 배우는 트리 기반 Machine Learning

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 R로 배우는 트리 기반 Machine Learning을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.