Interactive Course

Supervised Learning with scikit-learn

Learn how to build and tune predictive models and evaluate how well they will perform on unseen data.

  • 4 hours
  • 17 Videos
  • 54 Exercises
  • 107,747 Participants
  • 4,200 XP

Loved by learners at thousands of top companies:

mercedes-grey.svg
axa-grey.svg
whole-foods-grey.svg
rei-grey.svg
deloitte-grey.svg
t-mobile-grey.svg

Course Description

At the end of day, the value of Data Scientists rests on their ability to describe the world and to make predictions. Machine Learning is the field of teaching machines and computers to learn from existing data to make predictions on new data - will a given tumor be benign or malignant? Which of your customers will take their business elsewhere? Is a particular email spam or not? In this course, you'll learn how to use Python to perform supervised learning, an essential component of Machine Learning. You'll learn how to build predictive models, how to tune their parameters and how to tell how well they will perform on unseen data, all the while using real world datasets. You'll do so using scikit-learn, one of the most popular and user-friendly machine learning libraries for Python.

  1. 1

    Classification

    Free

    In this chapter, you will be introduced to classification problems and learn how to solve them using supervised learning techniques. Classification problems are prevalent in a variety of domains, ranging from finance to healthcare. Here, you will have the chance to apply what you are learning to a political dataset, where you classify the party affiliation of United States Congressmen based on their voting records.

  2. Fine-tuning your model

    Having trained your model, your next task is to evaluate its performance. What metrics can you use to gauge how good your model is? So far, you have used accuracy for classification and R-squared for regression. In this chapter, you will learn about some of the other metrics available in scikit-learn that will allow you to assess your model's performance in a more nuanced manner. You will then learn to optimize both your classification as well as regression models using hyperparameter tuning.

  3. Regression

    In the previous chapter, you made use of image and political datasets to predict binary as well as multiclass outcomes. But what if your problem requires a continuous outcome? Regression, which is the focus of this chapter, is best suited to solving such problems. You will learn about fundamental concepts in regression and apply them to predict the life expectancy in a given country using Gapminder data.

  4. Preprocessing and pipelines

    This chapter will introduce the notion of pipelines and how scikit-learn allows for transformers and estimators to be chained together and used as a single unit. Pre-processing techniques will be then be introduced as a way to enhance model performance and pipelines will be the glue that ties together concepts in the prior chapters.

  1. 1

    Classification

    Free

    In this chapter, you will be introduced to classification problems and learn how to solve them using supervised learning techniques. Classification problems are prevalent in a variety of domains, ranging from finance to healthcare. Here, you will have the chance to apply what you are learning to a political dataset, where you classify the party affiliation of United States Congressmen based on their voting records.

  2. Regression

    In the previous chapter, you made use of image and political datasets to predict binary as well as multiclass outcomes. But what if your problem requires a continuous outcome? Regression, which is the focus of this chapter, is best suited to solving such problems. You will learn about fundamental concepts in regression and apply them to predict the life expectancy in a given country using Gapminder data.

  3. Fine-tuning your model

    Having trained your model, your next task is to evaluate its performance. What metrics can you use to gauge how good your model is? So far, you have used accuracy for classification and R-squared for regression. In this chapter, you will learn about some of the other metrics available in scikit-learn that will allow you to assess your model's performance in a more nuanced manner. You will then learn to optimize both your classification as well as regression models using hyperparameter tuning.

  4. Preprocessing and pipelines

    This chapter will introduce the notion of pipelines and how scikit-learn allows for transformers and estimators to be chained together and used as a single unit. Pre-processing techniques will be then be introduced as a way to enhance model performance and pipelines will be the glue that ties together concepts in the prior chapters.

What do other learners have to say?

Devon

“I've used other sites, but DataCamp's been the one that I've stuck with.”

Devon Edwards Joseph

Lloyd's Banking Group

Louis

“DataCamp is the top resource I recommend for learning data science.”

Louis Maiden

Harvard Business School

Ronbowers

“DataCamp is by far my favorite website to learn from.”

Ronald Bowers

Decision Science Analytics @ USAA

Hugo Bowne-Anderson
Hugo Bowne-Anderson

Data Scientist at DataCamp

Hugo is a data scientist, educator, writer and podcaster and DataCamp. His main interests are promoting data & AI literacy, helping to spread data skills through organizations and society and doing amateur stand up comedy in NYC. If you want to know what he likes to talk about, definitely check out DataFramed, the DataCamp podcast, which he hosts and produces: https://www.datacamp.com/community/podcast

See More
DataCamp Content Creator
DataCamp Content Creator

Course Instructor

See More
Icon Icon Icon professional info