Skip to main content

Course

Machine Learning with caret in R

IntermediateSkill Level

4.8+

Updated 11/2023

This course teaches the big ideas in machine learning like how to build and evaluate predictive models.

Start Course for Free

RMachine Learning

4 hr

24 videos

88 Exercises

6,200 XP

60,676

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular caret R package, which provides a consistent interface to all of R's most powerful machine learning facilities, is used throughout the course.

Prerequisites

Introduction to Regression in R

1

Regression Models: Fitting and Evaluating Their Performance

In the first chapter of this course, you'll fit regression models with train() and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).

Welcome to the course

In-sample RMSE for linear regression

In-sample RMSE for linear regression on diamonds

Out-of-sample error measures

Out-of-sample RMSE for linear regression

Randomly order the data frame

Try an 80/20 split

Predict on test set

Calculate test set RMSE by hand

Comparing out-of-sample RMSE to in-sample RMSE

Cross-validation

Advantage of cross-validation

10-fold cross-validation

5-fold cross-validation

5 x 5-fold cross-validation

Making predictions on new data

2

Classification Models: Fitting and Evaluating Their Performance

3

Tuning Model Parameters to Improve Performance

In this chapter, you will use the train() function to tweak model parameters through cross-validation and grid search.

Random forests and wine

Random forests vs. linear models

Fit a random forest

Explore a wider model space

Advantage of a longer tune length

Try a longer tune length

Custom tuning grids

Advantages of a custom tuning grid

Fit a random forest with custom tuning

Introducing glmnet

Advantage of glmnet

Make a custom trainControl

Fit glmnet with custom trainControl

glmnet with custom tuning grid

Why a custom tuning grid?

glmnet with custom trainControl and tuning

Interpreting glmnet plots

4

Preprocessing Data

In this chapter, you will practice using train() to preprocess data before fitting models, improving your ability to making accurate predictions.

Median imputation

Median imputation vs. omitting rows

Apply median imputation

KNN imputation

Comparing KNN imputation to median imputation

Use KNN imputation

Compare KNN and median imputation

Multiple preprocessing methods

Order of operations

Combining preprocessing methods

Handling low-information predictors

Why remove near zero variance predictors?

Remove near zero variance predictors

preProcess() and nearZeroVar()

Fit model on reduced blood-brain data

Principle components analysis (PCA)

Using PCA as an alternative to nearZeroVar()

5

Selecting Models: A Case Study in Churn Prediction

In the final chapter of this course, you'll learn how to use resamples() to compare multiple models and select (or ensemble) the best one(s).

Reusing a trainControl

Why reuse a trainControl?

Make custom train/test indices

Reintroducing glmnet

glmnet as a baseline model

Fit the baseline model

Reintroducing random forest

Random forest drawback

Random forest with custom trainControl

Comparing models

Matching train/test indices

Create a resamples object

More on resamples

Create a box-and-whisker plot

Create a scatterplot

Ensembling models

Machine Learning with caret in R

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.8

from 42 reviews

88%

12%

0%

0%

0%

Sort by

LINH

3 weeks ago

Martín

3 weeks ago

julio

3 weeks ago

Daniel

2 months ago

Andreas

2 months ago

Benjamin

2 months ago

Martín

julio

Daniel

FAQs

What is the caret package and why is it useful for machine learning in R?

The caret package provides a single consistent interface to hundreds of machine learning algorithms in R, simplifying model training, tuning, and evaluation workflows.

What types of models will I build in this course?

You will build regression models, classification models including logistic regression and random forests, and learn to tune hyperparameters for optimal performance.

Does the course cover data preprocessing techniques?

Yes. You will learn how to preprocess data for better model results, including handling missing values and transforming features, all within the caret framework.

How does the course evaluate model performance?

You will use cross-validation, RMSE for regression, AUC and ROC curves for classification, and learn to compare multiple models to select the best performer.

How large is this course compared to other DataCamp courses?

It is one of the larger courses with 88 exercises across five chapters and 6,200 XP, typically taking three to four hours to complete.

Join over 19 million learners and start Machine Learning with caret in R today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.