Skip to main content

Cluster Analysis in R

Develop a strong intuition for how hierarchical and k-means clustering work and learn how to apply them to extract insights from your data.

Start Course for Free
4 Hours16 Videos52 Exercises32,931 Learners
3800 XP

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

Cluster analysis is a powerful toolkit in the data science workbench. It is used to find groups of observations (clusters) that share similar characteristics. These similarities can inform all kinds of business decisions; for example, in marketing, it is used to identify distinct groups of customers for which advertisements can be tailored. In this course, you will learn about two commonly used clustering methods - hierarchical clustering and k-means clustering. You won't just learn how to use these methods, you'll build a strong intuition for how they work and how to interpret their results. You'll develop this intuition by exploring three different datasets: soccer player positions, wholesale customer spending data, and longitudinal occupational wage data.

  1. 1

    Calculating distance between observations


    Cluster analysis seeks to find groups of observations that are similar to one another, but the identified groups are different from each other. This similarity/difference is captured by the metric called distance. In this chapter, you will learn how to calculate the distance between observations for both continuous and categorical features. You will also develop an intuition for how the scales of your features can affect distance.

    Play Chapter Now
    What is cluster analysis?
    50 xp
    When to cluster?
    50 xp
    Distance between two observations
    50 xp
    Calculate & plot the distance between two players
    100 xp
    Using the dist() function
    100 xp
    Who are the closest players?
    50 xp
    The importance of scale
    50 xp
    Effects of scale
    100 xp
    When to scale data?
    50 xp
    Measuring distance for categorical data
    50 xp
    Calculating distance between categorical variables
    100 xp
    The closest observation to a pair
    50 xp
  2. 2

    Hierarchical clustering

    This chapter will help you answer the last question from chapter 1 - how do you find groups of similar observations (clusters) in your data using the distances that you have calculated? You will learn about the fundamental principles of hierarchical clustering - the linkage criteria and the dendrogram plot - and how both are used to build clusters. You will also explore data from a wholesale distributor in order to perform market segmentation of clients using their spending habits.

    Play Chapter Now

In the following tracks

Data ScientistMachine Learning ScientistUnsupervised Machine Learning


Richie CottonYashas Roy


Intermediate R
Dmitriy Gorenshteyn Headshot

Dmitriy Gorenshteyn

Lead Data Scientist at Memorial Sloan Kettering Cancer Center

Dmitriy is a Principal Data Scientist at Interos Inc. Previously, he worked in the Strategy & Innovation department at Memorial Sloan Kettering Cancer Center where he developed predictive models for programs aimed at improving patient care. Dmitriy completed his Doctorate in Quantitative & Computational Biology at Princeton University. His core teaching philosophy is centered on building intuition and understanding for the methods and tools available.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA