# Machine Learning with caret in R

This course teaches the big ideas in machine learning like how to build and evaluate predictive models.

Start Course for Free4 Hours24 Videos88 Exercises

## Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Training 2 or more people?Try DataCamp For Business

## Loved by learners at thousands of companies

## Course Description

Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular caret R package, which provides a consistent interface to all of R's most powerful machine learning facilities, is used throughout the course.

For Business

### Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more- 1
### Regression Models: Fitting and Evaluating Their Performance

**Free**In the first chapter of this course, you'll fit regression models with

`train()`

and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).Welcome to the course50 xpIn-sample RMSE for linear regression50 xpIn-sample RMSE for linear regression on diamonds100 xpOut-of-sample error measures50 xpOut-of-sample RMSE for linear regression50 xpRandomly order the data frame100 xpTry an 80/20 split100 xpPredict on test set100 xpCalculate test set RMSE by hand100 xpComparing out-of-sample RMSE to in-sample RMSE50 xpCross-validation50 xpAdvantage of cross-validation50 xp10-fold cross-validation100 xp5-fold cross-validation100 xp5 x 5-fold cross-validation100 xpMaking predictions on new data100 xp - 2
### Classification Models: Fitting and Evaluating Their Performance

In this chapter, you'll fit classification models with

`train()`

and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).Logistic regression on sonar50 xpWhy a train/test split?50 xpTry a 60/40 split100 xpFit a logistic regression model100 xpConfusion matrix50 xpConfusion matrix takeaways50 xpCalculate a confusion matrix100 xpCalculating accuracy50 xpCalculating true positive rate50 xpCalculating true negative rate50 xpClass probabilities and predictions50 xpProbabilities and classes50 xpTry another threshold100 xpFrom probabilites to confusion matrix100 xpIntroducing the ROC curve50 xpWhat's the value of a ROC curve?50 xpPlot an ROC curve100 xpArea under the curve (AUC)50 xpModel, ROC, and AUC50 xpCustomizing trainControl100 xpUsing custom trainControl100 xp - 3
### Tuning Model Parameters to Improve Performance

In this chapter, you will use the

`train()`

function to tweak model parameters through cross-validation and grid search.Random forests and wine50 xpRandom forests vs. linear models50 xpFit a random forest100 xpExplore a wider model space50 xpAdvantage of a longer tune length50 xpTry a longer tune length100 xpCustom tuning grids50 xpAdvantages of a custom tuning grid50 xpFit a random forest with custom tuning100 xpIntroducing glmnet50 xpAdvantage of glmnet50 xpMake a custom trainControl100 xpFit glmnet with custom trainControl100 xpglmnet with custom tuning grid50 xpWhy a custom tuning grid?50 xpglmnet with custom trainControl and tuning100 xpInterpreting glmnet plots50 xp - 4
### Preprocessing Data

In this chapter, you will practice using

`train()`

to preprocess data before fitting models, improving your ability to making accurate predictions.Median imputation50 xpMedian imputation vs. omitting rows50 xpApply median imputation100 xpKNN imputation50 xpComparing KNN imputation to median imputation50 xpUse KNN imputation100 xpCompare KNN and median imputation50 xpMultiple preprocessing methods50 xpOrder of operations50 xpCombining preprocessing methods100 xpHandling low-information predictors50 xpWhy remove near zero variance predictors?50 xpRemove near zero variance predictors100 xppreProcess() and nearZeroVar()50 xpFit model on reduced blood-brain data100 xpPrinciple components analysis (PCA)50 xpUsing PCA as an alternative to nearZeroVar()100 xp - 5
### Selecting Models: A Case Study in Churn Prediction

In the final chapter of this course, you'll learn how to use

`resamples()`

to compare multiple models and select (or ensemble) the best one(s).Reusing a trainControl50 xpWhy reuse a trainControl?50 xpMake custom train/test indices100 xpReintroducing glmnet50 xpglmnet as a baseline model50 xpFit the baseline model100 xpReintroducing random forest50 xpRandom forest drawback50 xpRandom forest with custom trainControl100 xpComparing models50 xpMatching train/test indices50 xpCreate a resamples object100 xpMore on resamples50 xpCreate a box-and-whisker plot100 xpCreate a scatterplot100 xpEnsembling models100 xpSummary50 xp

Collaborators

Prerequisites

Introduction to Regression in RZachary Deane-Mayer

See MoreVP, Data Science at DataRobot

Max Kuhn

See MoreSoftware Engineer at RStudio and creator of caret

Dr. Max Kuhn is a Software Engineer at RStudio. He is the author or maintainer of several R packages for predictive modeling including caret, AppliedPredictiveModeling, Cubist, C50 and SparseLDA. He routinely teaches classes in predictive modeling at Predictive Analytics World and UseR! and his publications include work on neuroscience biomarkers, drug discovery, molecular diagnostics and response surface methodology.

## Join over 13 million learners and start Machine Learning with caret in R today!

## Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.