Loved by learners at thousands of companies
Grow your machine learning skills with scikit-learn and discover how to use this popular Python library to train models using labeled data. In this course, you'll learn how to make powerful predictions, such as whether a customer is will churn from your business, whether an individual has diabetes, and even how to tell classify the genre of a song. Using real-world datasets, you'll find out how to build predictive models, tune their parameters, and determine how well they will perform with unseen data.
In this chapter, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.Machine learning with scikit-learn50 xpBinary classification50 xpThe supervised learning workflow100 xpThe classification challenge50 xpk-Nearest Neighbors: Fit100 xpk-Nearest Neighbors: Predict100 xpMeasuring model performance50 xpTrain/test split + computing accuracy100 xpOverfitting and underfitting100 xpVisualizing model complexity100 xp
In this chapter, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.Introduction to regression50 xpCreating features100 xpBuilding a linear regression model100 xpVisualizing a linear regression model100 xpThe basics of linear regression50 xpFit and predict for regression100 xpRegression performance100 xpCross-validation50 xpCross-validation for R-squared100 xpAnalyzing cross-validation metrics100 xpRegularized regression50 xpRegularized regression: Ridge100 xpLasso regression for feature importance100 xp
Fine-tuning your model
Having trained models, now you will learn how to evaluate them. In this chapter, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.How good is your model?50 xpDeciding on a primary metric50 xpAssessing a diabetes prediction classifier100 xpLogistic regression and the ROC curve50 xpBuilding a logistic regression model100 xpThe ROC curve100 xpROC AUC100 xpHyperparameter tuning50 xpHyperparameter tuning with GridSearchCV100 xpHyperparameter tuning with RandomSearchCV100 xp
Preprocessing and pipelines
Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!Preprocessing data50 xpCreating dummy variables100 xpRegression with categorical features100 xpHandling missing data50 xpDropping missing data100 xpPipeline for song genre prediction: I100 xpPipeline for song genre prediction: II100 xpCentering and scaling50 xpCentering and scaling for regression100 xpCentering and scaling for classification100 xpEvaluating multiple models50 xpVisualizing regression model performance100 xpPredicting on the test set100 xpVisualizing classification model performance100 xpPipeline for predicting song popularity100 xpCongratulations50 xp
PrerequisitesIntroduction to Statistics in Python
Core Curriculum Manager, DataCamp
George is a Core Curriculum Manager at DataCamp. He has experience in project management across public health, applied research, and not-for-profit sectors. George is passionate about health technologies and all things data science.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA