- 24 Videos
- 88 Exercises
- 4 hours
- 6,781 Participants
- 6250 XP

**Instructor(s):**

Zach is a Data Scientist at DataRobot and co-author of the caret R package. He's fascinated by predicting the future and spends his free time competing in predictive modeling competitions. He's currently one of top 500 data scientists on Kaggle and took 9th place in the Heritage Health Prize as part of the Analytics Inside team.

Dr. Max Kuhn is a Director of Non-Clinical Statistics at Pfizer Global R&D in Groton Connecticut. He is the author or maintainer of several R packages for predictive modeling including caret, AppliedPredictiveModeling, Cubist, C50 and SparseLDA. He routinely teaches classes in predictive modeling at Predictive Analytics World and UseR! and his publications include work on neuroscience biomarkers, drug discovery, molecular diagnostics and response surface methodology.

Nick Carchedi

Tom Jeon

Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular `caret`

R package, which provides a consistent interface to all of R's most powerful machine learning facilities, is used throughout the course.

In the first chapter of this course, you'll fit regression models with `train()`

and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).

- Welcome to the course 50 xp
- In-sample RMSE for linear regression 50 xp
- In-sample RMSE for linear regression on diamonds 100 xp
- Introducing out-of-sample error measures 50 xp
- Out-of-sample RMSE for linear regression 50 xp
- Randomly order the data frame 100 xp
- Try an 80/20 split 100 xp
- Predict on test set 100 xp
- Calculate test set RMSE by hand 100 xp
- Comparing out-of-sample RMSE to in-sample RMSE 50 xp
- Cross-validation 50 xp
- Advantage of cross-validation 50 xp
- 10-fold cross-validation 100 xp
- 5-fold cross-validation 100 xp
- 5 x 5-fold cross-validation 100 xp
- Making predictions on new data 100 xp

In this chapter, you'll fit classification models with `train()`

and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).

In this chapter, you will use the `train()`

function to tweak model parameters through cross-validation and grid search.

In this chapter, you will practice using `train()`

to preprocess data before fitting models, improving your ability to making accurate predictions.

In the final chapter of this course, you'll learn how to use `resamples()`

to compare multiple models and select (or ensemble) the best one(s).