Hoppa till huvudinnehållet

Kurs

Cluster Analysis in R

MedelnivåKunskapsnivå

Uppdaterad 2024-11

Develop a strong intuition for how hierarchical and k-means clustering work and learn how to apply them to extract insights from your data.

Starta kursen gratis

RMachine Learning

4 tim

16 videor

52 Övningar

3,800 XP

44,070

Intyg om genomförande

Omtyckt av lärande på tusentals företag

Utbildar du ett team?

Prova för företag

Kursbeskrivning

Learn How to Perform Cluster Analysis

Cluster analysis is a powerful toolkit in the data science workbench. It is used to find groups of observations (clusters) that share similar characteristics. These similarities can inform all kinds of business decisions; for example, in marketing, it is used to identify distinct groups of customers for which advertisements can be tailored.

Explore Hierarchical and K-Means Clustering Techniques

In this course, you will learn about two commonly used clustering methods - hierarchical clustering and k-means clustering. You won't just learn how to use these methods, you'll build a strong intuition for how they work and how to interpret their results. You'll develop this intuition by exploring three different datasets: soccer player positions, wholesale customer spending data, and longitudinal occupational wage data.

Hone Your Skills with a Hands-On Case Study

You’ll finish the course by applying your new skills to a case study based around average salaries and how they have changed over time. This will combine hierarchical clustering techniques such as occupation trees, preparing for exploration, and plotting occupational clusters, with k-means techniques including elbow analysis and average silhouette widths.

DataCamp courses are comprised of a mixture of videos, articles, and practice exercises so that you have the chance to test and cement your new-found skills so that you feel confident applying them outside a course setting.

Förkunskapskrav

1

Calculating Distance Between Observations

Cluster analysis seeks to find groups of observations that are similar to one another, but the identified groups are different from each other. This similarity/difference is captured by the metric called distance. In this chapter, you will learn how to calculate the distance between observations for both continuous and categorical features. You will also develop an intuition for how the scales of your features can affect distance.

What is cluster analysis?

When to cluster?

Distance between two observations

Calculate & plot the distance between two players

Using the dist() function

Who are the closest players?

The importance of scale

Effects of scale

When to scale data?

Measuring distance for categorical data

Calculating distance between categorical variables

The closest observation to a pair

2

Hierarchical Clustering

This chapter will help you answer the last question from chapter 1—how do you find groups of similar observations (clusters) in your data using the distances that you have calculated? You will learn about the fundamental principles of hierarchical clustering - the linkage criteria and the dendrogram plot - and how both are used to build clusters. You will also explore data from a wholesale distributor in order to perform market segmentation of clients using their spending habits.

Comparing more than two observations

Calculating linkage

Revisited: The closest observation to a pair

Capturing K clusters

Assign cluster membership

Exploring the clusters

Validating the clusters

Visualizing the dendrogram

Comparing average, single & complete linkage

Height of the tree

Cutting the tree

Clusters based on height

Exploring the branches cut from the tree

What do we know about our clusters?

Making sense of the clusters

Segment wholesale customers

Explore wholesale customer clusters

Interpreting the wholesale customer clusters

3

K-means Clustering

In this chapter, you will build an understanding of the principles behind the k-means algorithm, learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.

Introduction to K-means

K-means on a soccer field

K-means on a soccer field (part 2)

Evaluating different values of K by eye

Many K's many models

Elbow (Scree) plot

Interpreting the elbow plot

Silhouette analysis: observation level performance

Silhouette analysis

Making sense of the K-means clusters

Revisiting wholesale data: "Best" k

Revisiting wholesale data: Exploration

4

Case Study: National Occupational Mean Wage

In this chapter, you will apply the skills you have learned to explore how the average salary amongst professions have changed over time.

Occupational wage data

Initial exploration of the data

Hierarchical clustering: Occupation trees

Hierarchical clustering: Preparing for exploration

Hierarchical clustering: Plotting occupational clusters

Reviewing the HC results

K-means: Elbow analysis

K-means: Average Silhouette Widths

The "best" number of clusters

Review K-means results

Cluster Analysis in R

Kurs
slutförd

Tjäna ett prestationsbevis

Lägg till det här beviset i din LinkedIn-profil, ditt CV eller din meritförteckning
Dela det i sociala medier och i din medarbetarutvärderingRegistrera dig nu

Gå med 19 miljoner lärande och börja Cluster Analysis in R idag!

Utveckla dina datakunskaper med DataCamp för mobilen

Gör framsteg när du är på språng med våra mobila kurser och dagliga 5-minuters kodningsutmaningar.