
Loved by learners at thousands of companies
Course Description
Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular caret R package, which provides a consistent interface to all of R's most powerful machine learning facilities, is used throughout the course.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1Regression Models: Fitting and Evaluating Their PerformanceFreeIn the first chapter of this course, you'll fit regression models with train()and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).Welcome to the course50 xpIn-sample RMSE for linear regression50 xpIn-sample RMSE for linear regression on diamonds100 xpOut-of-sample error measures50 xpOut-of-sample RMSE for linear regression50 xpRandomly order the data frame100 xpTry an 80/20 split100 xpPredict on test set100 xpCalculate test set RMSE by hand100 xpComparing out-of-sample RMSE to in-sample RMSE50 xpCross-validation50 xpAdvantage of cross-validation50 xp10-fold cross-validation100 xp5-fold cross-validation100 xp5 x 5-fold cross-validation100 xpMaking predictions on new data100 xp
- 2Classification Models: Fitting and Evaluating Their PerformanceIn this chapter, you'll fit classification models with train()and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).Logistic regression on sonar50 xpWhy a train/test split?50 xpTry a 60/40 split100 xpFit a logistic regression model100 xpConfusion matrix50 xpConfusion matrix takeaways50 xpCalculate a confusion matrix100 xpCalculating accuracy50 xpCalculating true positive rate50 xpCalculating true negative rate50 xpClass probabilities and predictions50 xpProbabilities and classes50 xpTry another threshold100 xpFrom probabilites to confusion matrix100 xpIntroducing the ROC curve50 xpWhat's the value of a ROC curve?50 xpPlot an ROC curve100 xpArea under the curve (AUC)50 xpModel, ROC, and AUC50 xpCustomizing trainControl100 xpUsing custom trainControl100 xp
- 3Tuning Model Parameters to Improve PerformanceIn this chapter, you will use the train()function to tweak model parameters through cross-validation and grid search.Random forests and wine50 xpRandom forests vs. linear models50 xpFit a random forest100 xpExplore a wider model space50 xpAdvantage of a longer tune length50 xpTry a longer tune length100 xpCustom tuning grids50 xpAdvantages of a custom tuning grid50 xpFit a random forest with custom tuning100 xpIntroducing glmnet50 xpAdvantage of glmnet50 xpMake a custom trainControl100 xpFit glmnet with custom trainControl100 xpglmnet with custom tuning grid50 xpWhy a custom tuning grid?50 xpglmnet with custom trainControl and tuning100 xpInterpreting glmnet plots50 xp
- 4Preprocessing DataIn this chapter, you will practice using train()to preprocess data before fitting models, improving your ability to making accurate predictions.Median imputation50 xpMedian imputation vs. omitting rows50 xpApply median imputation100 xpKNN imputation50 xpComparing KNN imputation to median imputation50 xpUse KNN imputation100 xpCompare KNN and median imputation50 xpMultiple preprocessing methods50 xpOrder of operations50 xpCombining preprocessing methods100 xpHandling low-information predictors50 xpWhy remove near zero variance predictors?50 xpRemove near zero variance predictors100 xppreProcess() and nearZeroVar()50 xpFit model on reduced blood-brain data100 xpPrinciple components analysis (PCA)50 xpUsing PCA as an alternative to nearZeroVar()100 xp
- 5Selecting Models: A Case Study in Churn PredictionIn the final chapter of this course, you'll learn how to use resamples()to compare multiple models and select (or ensemble) the best one(s).Reusing a trainControl50 xpWhy reuse a trainControl?50 xpMake custom train/test indices100 xpReintroducing glmnet50 xpglmnet as a baseline model50 xpFit the baseline model100 xpReintroducing random forest50 xpRandom forest drawback50 xpRandom forest with custom trainControl100 xpComparing models50 xpMatching train/test indices50 xpCreate a resamples object100 xpMore on resamples50 xpCreate a box-and-whisker plot100 xpCreate a scatterplot100 xpEnsembling models100 xpSummary50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.collaborators


prerequisites
Introduction to Regression in R Max Kuhn
Max KuhnSoftware Engineer at RStudio and creator of caret
Dr. Max Kuhn is a Software Engineer at RStudio. He is the author or maintainer of several R packages for predictive modeling including caret, AppliedPredictiveModeling, Cubist, C50 and SparseLDA.  He routinely teaches classes in predictive modeling at Predictive Analytics World and UseR! and his publications include work on neuroscience biomarkers, drug discovery, molecular diagnostics and response surface methodology.
Join over 18 million learners and start Machine Learning with caret in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
