Loved by learners at thousands of companies
This beginner-level introduction to machine learning covers four of the most common classification algorithms. You will come away with a basic understanding of how each algorithm approaches a learning task, as well as learn the R functions needed to apply these tools to your own work.
k-Nearest Neighbors (kNN)Free
As the kNN algorithm literally "learns by example" it is a case in point for starting to understand supervised machine learning. This chapter will introduce classification while working through the application of kNN to self-driving vehicle road sign recognition.Classification with Nearest Neighbors50 xpRecognizing a road sign with kNN100 xpThinking like kNN50 xpExploring the traffic sign dataset100 xpClassifying a collection of road signs100 xpWhat about the 'k' in kNN?50 xpUnderstanding the impact of 'k'50 xpTesting other 'k' values100 xpSeeing how the neighbors voted100 xpData preparation for kNN50 xpWhy normalize data?50 xp
Naive Bayes uses principles from the field of statistics to make predictions. This chapter will introduce the basics of Bayesian methods while exploring how to apply these techniques to iPhone-like destination suggestions.Understanding Bayesian methods50 xpComputing probabilities100 xpUnderstanding dependent events50 xpA simple Naive Bayes location model100 xpExamining "raw" probabilities100 xpUnderstanding independence50 xpUnderstanding NB's "naivety"50 xpWho are you calling naive?50 xpA more sophisticated location model100 xpPreparing for unforeseen circumstances100 xpUnderstanding the Laplace correction50 xpApplying Naive Bayes to other problems50 xpHandling numeric predictors50 xp
Logistic regression involves fitting a curve to numeric data to make predictions about binary events. Arguably one of the most widely used machine learning methods, this chapter will provide an overview of the technique while illustrating how to apply it to fundraising data.Making binary predictions with regression50 xpBuilding simple logistic regression models100 xpMaking a binary prediction100 xpThe limitations of accuracy50 xpModel performance tradeoffs50 xpCalculating ROC Curves and AUC100 xpComparing ROC curves50 xpDummy variables, missing data, and interactions50 xpCoding categorical features100 xpHandling missing data100 xpUnderstanding missing value indicators50 xpBuilding a more sophisticated model100 xpAutomatic feature selection50 xpThe dangers of stepwise regression50 xpBuilding a stepwise regression model100 xp
Classification trees use flowchart-like structures to make decisions. Because humans can readily understand these tree structures, classification trees are useful when transparency is needed, such as in loan approval. We'll use the Lending Club dataset to simulate this scenario.Making decisions with trees50 xpBuilding a simple decision tree100 xpVisualizing classification trees100 xpUnderstanding the tree's decisions50 xpGrowing larger classification trees50 xpWhy do some branches split?50 xpCreating random test datasets100 xpBuilding and evaluating a larger tree100 xpConducting a fair performance evaluation50 xpTending to classification trees50 xpPreventing overgrown trees100 xpCreating a nicely pruned tree100 xpWhy do trees benefit from pruning?50 xpSeeing the forest from the trees50 xpUnderstanding random forests50 xpBuilding a random forest model100 xp
Data Scientist at the University of Michigan
Brett Lantz is a data scientist at the University of Michigan and the author of Machine Learning with R. After training as a sociologist, Brett has applied his endless thirst for data to projects that involve understanding and predicting human behavior.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA