Loved by learners at thousands of companies
Many times in machine learning, the goal is to find patterns in data without trying to make predictions. This is called unsupervised learning. One common use case of unsupervised learning is grouping consumers based on demographics and purchasing history to deploy targeted marketing campaigns. Another example is wanting to describe the unmeasured factors that most influence crime differences between cities. This course provides a basic introduction to clustering and dimensionality reduction in R from a machine learning perspective, so that you can get from data to insights as quickly as possible.
Unsupervised learning in RFree
The k-means algorithm is one common approach to clustering. Learn how the algorithm works under the hood, implement k-means clustering in R, visualize and interpret the results, and select the number of clusters when it's not known ahead of time. By the end of the chapter, you'll have applied k-means clustering to a fun "real-world" dataset!Welcome to the course!50 xpIdentify clustering problems50 xpIntroduction to k-means clustering50 xpk-means clustering100 xpResults of kmeans()100 xpVisualizing and interpreting results of kmeans()100 xpHow k-means works and practical matters50 xpHandling random algorithms100 xpSelecting number of clusters100 xpIntroduction to the Pokemon data50 xpPractical matters: working with real data100 xpReview of k-means clustering50 xp
Hierarchical clustering is another popular method for clustering. The goal of this chapter is to go over how it works, how to use it, and how it compares to k-means clustering.Introduction to hierarchical clustering50 xpHierarchical clustering with results100 xpSelecting number of clusters50 xpInterpreting dendrogram50 xpCutting the tree100 xpClustering linkage and practical matters50 xpLinkage methods100 xpComparing linkage methods50 xpPractical matters: scaling100 xpComparing kmeans() and hclust()100 xpReview of hierarchical clustering50 xp
Dimensionality reduction with PCA
Principal component analysis, or PCA, is a common approach to dimensionality reduction. Learn exactly what PCA does, visualize the results of PCA with biplots and scree plots, and deal with practical issues such as centering and scaling the data before performing PCA.Introduction to PCA50 xpPCA using prcomp()100 xpResults of PCA50 xpAdditional results of PCA50 xpVisualizing and interpreting PCA results50 xpInterpreting biplots (1)50 xpInterpreting biplots (2)50 xpVariance explained100 xpVisualize variance explained100 xpPractical issues with PCA50 xpPractical issues: scaling100 xpAdditional uses of PCA and wrap-up50 xp
Putting it all together with a case study
The goal of this chapter is to guide you through a complete analysis using the unsupervised learning techniques covered in the first three chapters. You'll extend what you've learned by combining PCA as a preprocessing step to clustering using data that consist of measurements of cell nuclei of human breast masses.Introduction to the case study50 xpPreparing the data100 xpExploratory data analysis50 xpPerforming PCA100 xpInterpreting PCA results100 xpVariance explained100 xpPCA review and next steps50 xpCommunicating PCA results50 xpHierarchical clustering of case data100 xpResults of hierarchical clustering50 xpSelecting number of clusters100 xpk-means clustering and comparing results100 xpClustering on PCA results100 xpWrap-up and review50 xp
In the following tracksData EngineerData ScientistMachine Learning FundamentalsMachine Learning ScientistSQL FundamentalsUnsupervised Machine Learning
PrerequisitesIntroduction to R
Senior Data Scientist, Boeing
Hank is a Senior Data Scientist at Boeing and a long time user of the R language. Prior to his current role, he led the Customer Data Science team at H2O.ai, a leading provider of machine learning and predictive analytics services.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA