Loved by learners at thousands of companies
Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular
caret R package, which provides a consistent interface to all of R's most powerful machine learning facilities, is used throughout the course.
Regression models: fitting them and evaluating their performanceFree
In the first chapter of this course, you'll fit regression models with
train()and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).Welcome to the Toolbox50 xpIn-sample RMSE for linear regression50 xpIn-sample RMSE for linear regression on diamonds100 xpOut-of-sample error measures50 xpOut-of-sample RMSE for linear regression50 xpRandomly order the data frame100 xpTry an 80/20 split100 xpPredict on test set100 xpCalculate test set RMSE by hand100 xpComparing out-of-sample RMSE to in-sample RMSE50 xpCross-validation50 xpAdvantage of cross-validation50 xp10-fold cross-validation100 xp5-fold cross-validation100 xp5 x 5-fold cross-validation100 xpMaking predictions on new data100 xp
Classification models: fitting them and evaluating their performance
In this chapter, you'll fit classification models with
train()and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).Logistic regression on sonar50 xpWhy a train/test split?50 xpTry a 60/40 split100 xpFit a logistic regression model100 xpConfusion matrix50 xpConfusion matrix takeaways50 xpCalculate a confusion matrix100 xpCalculating accuracy50 xpCalculating true positive rate50 xpCalculating true negative rate50 xpClass probabilities and predictions50 xpProbabilities and classes50 xpTry another threshold100 xpFrom probabilites to confusion matrix100 xpIntroducing the ROC curve50 xpWhat's the value of a ROC curve?50 xpPlot an ROC curve100 xpArea under the curve (AUC)50 xpModel, ROC, and AUC50 xpCustomizing trainControl100 xpUsing custom trainControl100 xp
Tuning model parameters to improve performance
In this chapter, you will use the
train()function to tweak model parameters through cross-validation and grid search.Random forests and wine50 xpRandom forests vs. linear models50 xpFit a random forest100 xpExplore a wider model space50 xpAdvantage of a longer tune length50 xpTry a longer tune length100 xpCustom tuning grids50 xpAdvantages of a custom tuning grid50 xpFit a random forest with custom tuning100 xpIntroducing glmnet50 xpAdvantage of glmnet50 xpMake a custom trainControl100 xpFit glmnet with custom trainControl100 xpglmnet with custom tuning grid50 xpWhy a custom tuning grid?50 xpglmnet with custom trainControl and tuning100 xpInterpreting glmnet plots50 xp
Preprocessing your data
In this chapter, you will practice using
train()to preprocess data before fitting models, improving your ability to making accurate predictions.Median imputation50 xpMedian imputation vs. omitting rows50 xpApply median imputation100 xpKNN imputation50 xpComparing KNN imputation to median imputation50 xpUse KNN imputation100 xpCompare KNN and median imputation50 xpMultiple preprocessing methods50 xpOrder of operations50 xpCombining preprocessing methods100 xpHandling low-information predictors50 xpWhy remove near zero variance predictors?50 xpRemove near zero variance predictors100 xppreProcess() and nearZeroVar()50 xpFit model on reduced blood-brain data100 xpPrinciple components analysis (PCA)50 xpUsing PCA as an alternative to nearZeroVar()100 xp
Selecting models: a case study in churn prediction
In the final chapter of this course, you'll learn how to use
resamples()to compare multiple models and select (or ensemble) the best one(s).Reusing a trainControl50 xpWhy reuse a trainControl?50 xpMake custom train/test indices100 xpReintroducing glmnet50 xpglmnet as a baseline model50 xpFit the baseline model100 xpReintroducing random forest50 xpRandom forest drawback50 xpRandom forest with custom trainControl100 xpComparing models50 xpMatching train/test indices50 xpCreate a resamples object100 xpMore on resamples50 xpCreate a box-and-whisker plot100 xpCreate a scatterplot100 xpEnsembling models100 xpSummary50 xp
PrerequisitesIntroduction to Regression in R
VP, Data Science at DataRobot
Zach is a Data Scientist at DataRobot and co-author of the caret R package. He's fascinated by predicting the future and spends his free time competing in predictive modeling competitions. He's currently one of top 500 data scientists on Kaggle and took 9th place in the Heritage Health Prize as part of the Analytics Inside team.
Software Engineer at RStudio and creator of caret
Dr. Max Kuhn is a Software Engineer at RStudio. He is the author or maintainer of several R packages for predictive modeling including caret, AppliedPredictiveModeling, Cubist, C50 and SparseLDA. He routinely teaches classes in predictive modeling at Predictive Analytics World and UseR! and his publications include work on neuroscience biomarkers, drug discovery, molecular diagnostics and response surface methodology.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA