Ben Bolstad has completed

Machine Learning with scikit-learn

4 hr

4,200 XP

Loved by learners at thousands of companies

Course Description

Machine learning is the field that teaches machines and computers to learn from existing data to make predictions on new data: Will a tumor be benign or malignant? Which of your customers will take their business elsewhere? Is a particular email spam? In this course, you'll learn how to use Python to perform supervised learning, an essential component of machine learning. You'll learn how to build predictive models, tune their parameters, and determine how well they will perform with unseen data—all while using real world datasets. You'll be using scikit-learn, one of the most popular and user-friendly machine learning libraries for Python.

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

1
Classification
Free
In this chapter, you will be introduced to classification problems and learn how to solve them using supervised learning techniques. And you’ll apply what you learn to a political dataset, where you classify the party affiliation of United States congressmen based on their voting records.
Play Chapter Now
Supervised learning
50 xp
Which of these is a classification problem?
50 xp
Exploratory data analysis
50 xp
Numerical EDA
50 xp
Visual EDA
50 xp
The classification challenge
50 xp
k-Nearest Neighbors: Fit
100 xp
k-Nearest Neighbors: Predict
100 xp
Measuring model performance
50 xp
The digits recognition dataset
100 xp
Train/Test Split + Fit/Predict/Accuracy
100 xp
Overfitting and underfitting
100 xp
2
Regression
In the previous chapter, you used image and political datasets to predict binary and multiclass outcomes. But what if your problem requires a continuous outcome? Regression is best suited to solving such problems. You will learn about fundamental concepts in regression and apply them to predict the life expectancy in a given country using Gapminder data.
Play Chapter Now
Introduction to regression
50 xp
Which of the following is a regression problem?
50 xp
Importing data for supervised learning
100 xp
Exploring the Gapminder data
50 xp
The basics of linear regression
50 xp
Fit & predict for regression
100 xp
Train/test split for regression
100 xp
Cross-validation
50 xp
5-fold cross-validation
100 xp
K-Fold CV comparison
100 xp
Regularized regression
50 xp
Regularization I: Lasso
100 xp
Regularization II: Ridge
100 xp
3
Fine-tuning your model
Having trained your model, your next task is to evaluate its performance. In this chapter, you will learn about some of the other metrics available in scikit-learn that will allow you to assess your model's performance in a more nuanced manner. Next, learn to optimize your classification and regression models using hyperparameter tuning.
Play Chapter Now
How good is your model?
50 xp
Metrics for classification
100 xp
Logistic regression and the ROC curve
50 xp
Building a logistic regression model
100 xp
Plotting an ROC curve
100 xp
Precision-recall Curve
50 xp
Area under the ROC curve
50 xp
AUC computation
100 xp
Hyperparameter tuning
50 xp
Hyperparameter tuning with GridSearchCV
100 xp
Hyperparameter tuning with RandomizedSearchCV
100 xp
Hold-out set for final evaluation
50 xp
Hold-out set reasoning
50 xp
Hold-out set in practice I: Classification
100 xp
Hold-out set in practice II: Regression
100 xp
4
Preprocessing and pipelines
This chapter introduces pipelines, and how scikit-learn allows for transformers and estimators to be chained together and used as a single unit. Preprocessing techniques will be introduced as a way to enhance model performance, and pipelines will tie together concepts from previous chapters.
Play Chapter Now
Preprocessing data
50 xp
Exploring categorical features
100 xp
Creating dummy variables
100 xp
Regression with categorical features
100 xp
Handling missing data
50 xp
Dropping missing data
100 xp
Imputing missing data in a ML Pipeline I
100 xp
Imputing missing data in a ML Pipeline II
100 xp
Centering and scaling
50 xp
Centering and scaling your data
100 xp
Centering and scaling in a pipeline
100 xp
Bringing it all together I: Pipeline for classification
100 xp
Bringing it all together II: Pipeline for regression
100 xp
Final thoughts
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

datasets

Automobile miles per gallon Boston housing Diabetes Gapminder US Congressional Voting Records (1984)White wine quality Red wine quality

collaborators

Yashas Roy

prerequisites

Statistical Thinking in Python (Part 1)

Hugo Bowne-Anderson

Data Scientist

Hugo is a data scientist, educator, writer and podcaster formerly at DataCamp. His main interests are promoting data & AI literacy, helping to spread data skills through organizations and society and doing amateur stand up comedy in NYC. If you want to know what he likes to talk about, definitely check out DataFramed, the DataCamp podcast, which he hosted and produced.

Join over 18 million learners and start Machine Learning with scikit-learn today!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Machine Learning with scikit-learn

Loved by learners at thousands of companies

Course Description

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Training 2 or more people?

Classification

Regression

Fine-tuning your model

Preprocessing and pipelines

Training 2 or more people?

Join over .css-ou6dz6{color:#03ef62;}18 million learners and start Machine Learning with scikit-learn today!

Create Your Free Account

Training 2 or more people?

Join over 18 million learners and start Machine Learning with scikit-learn today!