Skip to main content

Md. Saif Kabir Asif has completed

Cluster Analysis in R

Start Course for Free

4 hr

3,800 XP

Statement of Accomplishment Badge

Loved by learners at thousands of companies

Course Description

Learn How to Perform Cluster Analysis

Cluster analysis is a powerful toolkit in the data science workbench. It is used to find groups of observations (clusters) that share similar characteristics. These similarities can inform all kinds of business decisions; for example, in marketing, it is used to identify distinct groups of customers for which advertisements can be tailored.

Explore Hierarchical and K-Means Clustering Techniques

In this course, you will learn about two commonly used clustering methods - hierarchical clustering and k-means clustering. You won't just learn how to use these methods, you'll build a strong intuition for how they work and how to interpret their results. You'll develop this intuition by exploring three different datasets: soccer player positions, wholesale customer spending data, and longitudinal occupational wage data.

Hone Your Skills with a Hands-On Case Study

You’ll finish the course by applying your new skills to a case study based around average salaries and how they have changed over time. This will combine hierarchical clustering techniques such as occupation trees, preparing for exploration, and plotting occupational clusters, with k-means techniques including elbow analysis and average silhouette widths.

DataCamp courses are comprised of a mixture of videos, articles, and practice exercises so that you have the chance to test and cement your new-found skills so that you feel confident applying them outside a course setting.

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

1
Calculating Distance Between Observations
Free
Cluster analysis seeks to find groups of observations that are similar to one another, but the identified groups are different from each other. This similarity/difference is captured by the metric called distance. In this chapter, you will learn how to calculate the distance between observations for both continuous and categorical features. You will also develop an intuition for how the scales of your features can affect distance.
Play Chapter Now
What is cluster analysis?
50 xp
When to cluster?
50 xp
Distance between two observations
50 xp
Calculate & plot the distance between two players
100 xp
Using the dist() function
100 xp
Who are the closest players?
50 xp
The importance of scale
50 xp
Effects of scale
100 xp
When to scale data?
50 xp
Measuring distance for categorical data
50 xp
Calculating distance between categorical variables
100 xp
The closest observation to a pair
50 xp
2
Hierarchical Clustering
This chapter will help you answer the last question from chapter 1—how do you find groups of similar observations (clusters) in your data using the distances that you have calculated? You will learn about the fundamental principles of hierarchical clustering - the linkage criteria and the dendrogram plot - and how both are used to build clusters. You will also explore data from a wholesale distributor in order to perform market segmentation of clients using their spending habits.
Play Chapter Now
Comparing more than two observations
50 xp
Calculating linkage
100 xp
Revisited: The closest observation to a pair
50 xp
Capturing K clusters
50 xp
Assign cluster membership
100 xp
Exploring the clusters
100 xp
Validating the clusters
50 xp
Visualizing the dendrogram
50 xp
Comparing average, single & complete linkage
100 xp
Height of the tree
50 xp
Cutting the tree
50 xp
Clusters based on height
100 xp
Exploring the branches cut from the tree
100 xp
What do we know about our clusters?
50 xp
Making sense of the clusters
50 xp
Segment wholesale customers
100 xp
Explore wholesale customer clusters
100 xp
Interpreting the wholesale customer clusters
50 xp
3
K-means Clustering
In this chapter, you will build an understanding of the principles behind the k-means algorithm, learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.
Play Chapter Now
Introduction to K-means
50 xp
K-means on a soccer field
100 xp
K-means on a soccer field (part 2)
100 xp
Evaluating different values of K by eye
50 xp
Many K's many models
100 xp
Elbow (Scree) plot
100 xp
Interpreting the elbow plot
50 xp
Silhouette analysis: observation level performance
50 xp
Silhouette analysis
100 xp
Making sense of the K-means clusters
50 xp
Revisiting wholesale data: "Best" k
100 xp
Revisiting wholesale data: Exploration
100 xp
4
Case Study: National Occupational Mean Wage
In this chapter, you will apply the skills you have learned to explore how the average salary amongst professions have changed over time.
Play Chapter Now
Occupational wage data
50 xp
Initial exploration of the data
50 xp
Hierarchical clustering: Occupation trees
100 xp
Hierarchical clustering: Preparing for exploration
100 xp
Hierarchical clustering: Plotting occupational clusters
100 xp
Reviewing the HC results
50 xp
K-means: Elbow analysis
100 xp
K-means: Average Silhouette Widths
100 xp
The "best" number of clusters
50 xp
Review K-means results
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

resources

Occupational Employment Statistics (OES)Soccer player positions Wholesale customer spending

collaborators

Yashas Roy

Richie Cotton

prerequisites

Dmitriy Gorenshteyn

Lead Data Scientist at Memorial Sloan Kettering Cancer Center

Join over 19 million learners and start Cluster Analysis in R today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.