본문으로 바로가기

강의

R로 배우는 군집 분석

중급기술 수준

업데이트됨 2024. 11.

계층적 클러스터링과 k-평균 클러스터링의 작동 원리에 대한 강한 직관을 기르고, 이를 활용하여 데이터에서 통찰력을 추출하는 방법을 익히십시오.

무료로 강의 시작

RMachine Learning

4시간

16 동영상

52 연습 문제

3,800 XP

44,073

성취 증명서

수천 개 기업의 학습자들이 사랑하는

팀을 교육하시나요?

비즈니스용으로 체험해 보세요

강의 설명

군집 분석은 데이터 과학 작업에서 강력한 도구 모음이에요. 비슷한 특성을 공유하는 관측치들의 집단(클러스터)을 찾는 데 사용합니다. 이런 유사성은 다양한 비즈니스 의사결정에 활용될 수 있어요. 예를 들어 마케팅에서는 서로 다른 고객군을 찾아 각기 다른 광고를 집행할 수 있죠. 이 강의에서는 널리 쓰이는 두 가지 군집화 방법인 계층적 군집화와 k-means 군집화를 배웁니다. 단순히 사용하는 법만이 아니라, 알고리즘이 작동하는 원리와 결과를 해석하는 방법까지 직관을 기를 수 있도록 도와드려요. 이를 위해 세 가지 데이터셋(축구 선수 포지션, 도매 고객 지출 데이터, 직업별 평균 임금의 시계열 데이터)을 함께 탐색합니다.

선수 조건

1

Calculating Distance Between Observations

Cluster analysis seeks to find groups of observations that are similar to one another, but the identified groups are different from each other. This similarity/difference is captured by the metric called distance. In this chapter, you will learn how to calculate the distance between observations for both continuous and categorical features. You will also develop an intuition for how the scales of your features can affect distance.

What is cluster analysis?

When to cluster?

Distance between two observations

Calculate & plot the distance between two players

Using the dist() function

Who are the closest players?

The importance of scale

Effects of scale

When to scale data?

Measuring distance for categorical data

Calculating distance between categorical variables

The closest observation to a pair

2

Hierarchical Clustering

This chapter will help you answer the last question from chapter 1—how do you find groups of similar observations (clusters) in your data using the distances that you have calculated? You will learn about the fundamental principles of hierarchical clustering - the linkage criteria and the dendrogram plot - and how both are used to build clusters. You will also explore data from a wholesale distributor in order to perform market segmentation of clients using their spending habits.

Comparing more than two observations

Calculating linkage

Revisited: The closest observation to a pair

Capturing K clusters

Assign cluster membership

Exploring the clusters

Validating the clusters

Visualizing the dendrogram

Comparing average, single & complete linkage

Height of the tree

Cutting the tree

Clusters based on height

Exploring the branches cut from the tree

What do we know about our clusters?

Making sense of the clusters

Segment wholesale customers

Explore wholesale customer clusters

Interpreting the wholesale customer clusters

3

K-means Clustering

In this chapter, you will build an understanding of the principles behind the k-means algorithm, learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.

Introduction to K-means

K-means on a soccer field

K-means on a soccer field (part 2)

Evaluating different values of K by eye

Many K's many models

Elbow (Scree) plot

Interpreting the elbow plot

Silhouette analysis: observation level performance

Silhouette analysis

Making sense of the K-means clusters

Revisiting wholesale data: "Best" k

Revisiting wholesale data: Exploration

4

Case Study: National Occupational Mean Wage

In this chapter, you will apply the skills you have learned to explore how the average salary amongst professions have changed over time.

Occupational wage data

Initial exploration of the data

Hierarchical clustering: Occupation trees

Hierarchical clustering: Preparing for exploration

Hierarchical clustering: Plotting occupational clusters

Reviewing the HC results

K-means: Elbow analysis

K-means: Average Silhouette Widths

The "best" number of clusters

Review K-means results

R로 배우는 군집 분석

강의
완료

수료증 획득

LinkedIn 프로필, 이력서 또는 CV에 이 인증서를 추가하세요
소셜 미디어와 성과 평가에서 공유하세요지금 등록

19백만 명 이상의 학습자와 함께 R로 배우는 군집 분석을(를) 시작하세요!

DataCamp for Mobile을 통해 데이터 분석 능력을 향상시키세요.

모바일 강좌와 매일 5분 코딩 챌린지를 통해 이동 중에도 학습 효과를 높이세요.