Skip to main content

Supervised Learning with scikit-learn

Learn how to build and tune predictive models and evaluate how well they'll perform on unseen data.

Start Course for Free
4 Hours17 Videos54 Exercises305,450 Learners4200 XPData Scientist TrackMachine Learning Fundamentals TrackMachine Learning Scientist Track

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

Machine learning is the field that teaches machines and computers to learn from existing data to make predictions on new data: Will a tumor be benign or malignant? Which of your customers will take their business elsewhere? Is a particular email spam? In this course, you'll learn how to use Python to perform supervised learning, an essential component of machine learning. You'll learn how to build predictive models, tune their parameters, and determine how well they will perform with unseen data—all while using real world datasets. You'll be using scikit-learn, one of the most popular and user-friendly machine learning libraries for Python.

  1. 1



    In this chapter, you will be introduced to classification problems and learn how to solve them using supervised learning techniques. And you’ll apply what you learn to a political dataset, where you classify the party affiliation of United States congressmen based on their voting records.

    Play Chapter Now
    Supervised learning
    50 xp
    Which of these is a classification problem?
    50 xp
    Exploratory data analysis
    50 xp
    Numerical EDA
    50 xp
    Visual EDA
    50 xp
    The classification challenge
    50 xp
    k-Nearest Neighbors: Fit
    100 xp
    k-Nearest Neighbors: Predict
    100 xp
    Measuring model performance
    50 xp
    The digits recognition dataset
    100 xp
    Train/Test Split + Fit/Predict/Accuracy
    100 xp
    Overfitting and underfitting
    100 xp
  2. 2


    In the previous chapter, you used image and political datasets to predict binary and multiclass outcomes. But what if your problem requires a continuous outcome? Regression is best suited to solving such problems. You will learn about fundamental concepts in regression and apply them to predict the life expectancy in a given country using Gapminder data.

    Play Chapter Now
  3. 3

    Fine-tuning your model

    Having trained your model, your next task is to evaluate its performance. In this chapter, you will learn about some of the other metrics available in scikit-learn that will allow you to assess your model's performance in a more nuanced manner. Next, learn to optimize your classification and regression models using hyperparameter tuning.

    Play Chapter Now

In the following tracks

Data Scientist Machine Learning FundamentalsMachine Learning Scientist


yashasYashas Roy
Hugo Bowne-Anderson Headshot

Hugo Bowne-Anderson

Data Scientist at DataCamp

Hugo is a data scientist, educator, writer and podcaster at DataCamp. His main interests are promoting data & AI literacy, helping to spread data skills through organizations and society and doing amateur stand up comedy in NYC. If you want to know what he likes to talk about, definitely check out DataFramed, the DataCamp podcast, which he hosts and produces:
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA