Machine Learning with Tree-Based Models in Python

In this course, you'll learn how to use tree-based models and ensembles for regression and classification using scikit-learn.
Start Course for Free
5 Hours15 Videos57 Exercises46,924 Learners
4650 XP

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

Decision trees are supervised learning models used for problems involving classification and regression. Tree models present a high flexibility that comes at a price: on one hand, trees are able to capture complex non-linear relationships; on the other hand, they are prone to memorizing the noise present in a dataset. By aggregating the predictions of trees that are trained differently, ensemble methods take advantage of the flexibility of trees while reducing their tendency to memorize noise. Ensemble methods are used across a variety of fields and have a proven track record of winning many machine learning competitions. In this course, you'll learn how to use Python to train decision trees and tree-based models with the user-friendly scikit-learn machine learning library. You'll understand the advantages and shortcomings of trees and demonstrate how ensembling can alleviate these shortcomings, all while practicing on real-world datasets. Finally, you'll also understand how to tune the most influential hyperparameters in order to get the most out of your models.

  1. 1

    Classification and Regression Trees

    Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this chapter, you'll be introduced to the CART algorithm.
    Play Chapter Now
  2. 2

    The Bias-Variance Tradeoff

    The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.
    Play Chapter Now
  3. 3

    Bagging and Random Forests

    Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this chapter, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.
    Play Chapter Now
  4. 4


    Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this chapter, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.
    Play Chapter Now
  5. 5

    Model Tuning

    The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this chapter, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.
    Play Chapter Now
In the following tracks
Data Scientist Machine Learning Scientist
Sumedh PanchadharKara WooEunkyung Park
Elie Kawerk Headshot

Elie Kawerk

Data Scientist at Mirum Agency
Elie is a Data Scientist with a love for Python and open source. He currently works at Majid Al Futtaim Holding and has previously worked at Mirum agency. Elie holds a PhD in computational physics from the University of Paris VI, France and has held a research fellowship at the University of Trieste, Italy. Feel free to connect with him on Linkedin (elie-kawerk-data-scientist), and to follow him on Twitter at @ElieKawerk.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA