课程

Machine Learning with Tree-Based Models in Python

中级技能水平

更新时间 2025年12月

In this course, you'll learn how to use tree-based models and ensembles for regression and classification using scikit-learn.

免费开始课程

PythonMachine Learning5 小时15 视频57 练习4,650 经验值110K+成就声明

创建您的免费帐户

或

继续操作即表示您接受我们的《使用条款》和《隐私政策》，并同意您的数据存储在美国。

深受数千家公司学习者的喜爱

培训2人或更多？

试用DataCamp for Business

课程描述

Decision trees are supervised learning models used for problems involving classification and regression. Tree models present a high flexibility that comes at a price: on one hand, trees are able to capture complex non-linear relationships; on the other hand, they are prone to memorizing the noise present in a dataset. By aggregating the predictions of trees that are trained differently, ensemble methods take advantage of the flexibility of trees while reducing their tendency to memorize noise. Ensemble methods are used across a variety of fields and have a proven track record of winning many machine learning competitions. In this course, you'll learn how to use Python to train decision trees and tree-based models with the user-friendly scikit-learn machine learning library. You'll understand the advantages and shortcomings of trees and demonstrate how ensembling can alleviate these shortcomings, all while practicing on real-world datasets. Finally, you'll also understand how to tune the most influential hyperparameters in order to get the most out of your models.

先决条件

Supervised Learning with scikit-learn

1

Classification and Regression Trees

Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this chapter, you'll be introduced to the CART algorithm.

Decision tree for classification

50 经验值

Train your first classification tree

100 经验值

Evaluate the classification tree

100 经验值

Logistic regression vs classification tree

100 经验值

Classification tree Learning

50 经验值

Growing a classification tree

50 经验值

Using entropy as a criterion

100 经验值

Entropy vs Gini index

100 经验值

Decision tree for regression

50 经验值

Train your first regression tree

100 经验值

Evaluate the regression tree

100 经验值

Linear regression vs regression tree

100 经验值

2

The Bias-Variance Tradeoff

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.

Generalization Error

50 经验值

Complexity, bias and variance

50 经验值

Overfitting and underfitting

50 经验值

Diagnose bias and variance problems

50 经验值

Instantiate the model

100 经验值

Evaluate the 10-fold CV error

100 经验值

Evaluate the training error

100 经验值

High bias or high variance?

50 经验值

Ensemble Learning

50 经验值

Define the ensemble

100 经验值

Evaluate individual classifiers

100 经验值

Better performance with a Voting Classifier

100 经验值

3

Bagging and Random Forests

Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this chapter, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.

50 经验值

Define the bagging classifier

100 经验值

Evaluate Bagging performance

100 经验值

Out of Bag Evaluation

50 经验值

Prepare the ground

100 经验值

OOB Score vs Test Set Score

100 经验值

Random Forests (RF)

50 经验值

Train an RF regressor

100 经验值

Evaluate the RF regressor

100 经验值

Visualizing features importances

100 经验值

4

Boosting

Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this chapter, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.

50 经验值

Define the AdaBoost classifier

100 经验值

Train the AdaBoost classifier

100 经验值

Evaluate the AdaBoost classifier

100 经验值

Gradient Boosting (GB)

50 经验值

Define the GB regressor

100 经验值

Train the GB regressor

100 经验值

Evaluate the GB regressor

100 经验值

Stochastic Gradient Boosting (SGB)

50 经验值

Regression with SGB

100 经验值

Train the SGB regressor

100 经验值

Evaluate the SGB regressor

100 经验值

5

Model Tuning

The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this chapter, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.

Tuning a CART's Hyperparameters

50 经验值

Tree hyperparameters

50 经验值

Set the tree's hyperparameter grid

100 经验值

Search for the optimal tree

100 经验值

Evaluate the optimal tree

100 经验值

Tuning a RF's Hyperparameters

50 经验值

Random forests hyperparameters

50 经验值

Set the hyperparameter grid of RF

100 经验值

Search for the optimal forest

100 经验值

Evaluate the optimal forest

100 经验值

Congratulations!

50 经验值

Machine Learning with Tree-Based Models in Python

课程完成

获得成就证明

将此证书添加到你的 LinkedIn 档案、简历或履历中
在社交媒体和绩效评估中分享立即注册

加入超过19百万学习者，今天就开始Machine Learning with Tree-Based Models in Python！

创建您的免费帐户

或

继续操作即表示您接受我们的《使用条款》和《隐私政策》，并同意您的数据存储在美国。

通过 DataCamp for Mobile 提升您的数据技能

随时随地通过我们的移动课程和每日 5 分钟编程挑战提升技能。