Loved by learners at thousands of companies
Course Description
Machine learning is the field that teaches machines and computers to learn from existing data to make predictions on new data: Will a tumor be benign or malignant? Which of your customers will take their business elsewhere? Is a particular email spam? In this course, you'll learn how to use Python to perform supervised learning, an essential component of machine learning. You'll learn how to build predictive models, tune their parameters, and determine how well they will perform with unseen data—all while using real world datasets. You'll be using scikit-learn, one of the most popular and user-friendly machine learning libraries for Python.
For Business
Training 2 or more people?
Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more- 1
Classification
FreeIn this chapter, you will be introduced to classification problems and learn how to solve them using supervised learning techniques. And you’ll apply what you learn to a political dataset, where you classify the party affiliation of United States congressmen based on their voting records.
Supervised learning50 xpWhich of these is a classification problem?50 xpExploratory data analysis50 xpNumerical EDA50 xpVisual EDA50 xpThe classification challenge50 xpk-Nearest Neighbors: Fit100 xpk-Nearest Neighbors: Predict100 xpMeasuring model performance50 xpThe digits recognition dataset100 xpTrain/Test Split + Fit/Predict/Accuracy100 xpOverfitting and underfitting100 xp - 2
Regression
In the previous chapter, you used image and political datasets to predict binary and multiclass outcomes. But what if your problem requires a continuous outcome? Regression is best suited to solving such problems. You will learn about fundamental concepts in regression and apply them to predict the life expectancy in a given country using Gapminder data.
Introduction to regression50 xpWhich of the following is a regression problem?50 xpImporting data for supervised learning100 xpExploring the Gapminder data50 xpThe basics of linear regression50 xpFit & predict for regression100 xpTrain/test split for regression100 xpCross-validation50 xp5-fold cross-validation100 xpK-Fold CV comparison100 xpRegularized regression50 xpRegularization I: Lasso100 xpRegularization II: Ridge100 xp - 3
Fine-tuning your model
Having trained your model, your next task is to evaluate its performance. In this chapter, you will learn about some of the other metrics available in scikit-learn that will allow you to assess your model's performance in a more nuanced manner. Next, learn to optimize your classification and regression models using hyperparameter tuning.
How good is your model?50 xpMetrics for classification100 xpLogistic regression and the ROC curve50 xpBuilding a logistic regression model100 xpPlotting an ROC curve100 xpPrecision-recall Curve50 xpArea under the ROC curve50 xpAUC computation100 xpHyperparameter tuning50 xpHyperparameter tuning with GridSearchCV100 xpHyperparameter tuning with RandomizedSearchCV100 xpHold-out set for final evaluation50 xpHold-out set reasoning50 xpHold-out set in practice I: Classification100 xpHold-out set in practice II: Regression100 xp - 4
Preprocessing and pipelines
This chapter introduces pipelines, and how scikit-learn allows for transformers and estimators to be chained together and used as a single unit. Preprocessing techniques will be introduced as a way to enhance model performance, and pipelines will tie together concepts from previous chapters.
Preprocessing data50 xpExploring categorical features100 xpCreating dummy variables100 xpRegression with categorical features100 xpHandling missing data50 xpDropping missing data100 xpImputing missing data in a ML Pipeline I100 xpImputing missing data in a ML Pipeline II100 xpCentering and scaling50 xpCentering and scaling your data100 xpCentering and scaling in a pipeline100 xpBringing it all together I: Pipeline for classification100 xpBringing it all together II: Pipeline for regression100 xpFinal thoughts50 xp
For Business
Training 2 or more people?
Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and moredatasets
Automobile miles per gallonBoston housingDiabetesGapminderUS Congressional Voting Records (1984)White wine qualityRed wine qualitycollaborators
prerequisites
Statistical Thinking in Python (Part 1)Hugo Bowne-Anderson
See MoreData Scientist
Hugo is a data scientist, educator, writer and podcaster formerly at DataCamp. His main interests are promoting data & AI literacy, helping to spread data skills through organizations and society and doing amateur stand up comedy in NYC. If you want to know what he likes to talk about, definitely check out DataFramed, the DataCamp podcast, which he hosted and produced.
Join over 14 million learners and start Machine Learning with scikit-learn today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.