Skip to main content

Course

Cluster Analysis in Python

IntermediateSkill Level

4.8+

Updated 04/2026

In this course, you will be introduced to unsupervised learning through techniques such as hierarchical and k-means clustering using the SciPy library.

Start Course for Free

PythonMachine Learning

4 hr

14 videos

46 Exercises

3,650 XP

65,111

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

You have probably come across Google News, which automatically groups similar news articles under a topic. Have you ever wondered what process runs in the background to arrive at these groups? In this course, you will be introduced to unsupervised learning through clustering using the SciPy library in Python. This course covers pre-processing of data and application of hierarchical and k-means clustering. Through the course, you will explore player statistics from a popular football video game, FIFA 18. After completing the course, you will be able to quickly apply various clustering algorithms on data, visualize the clusters formed and analyze results.

Prerequisites

Intermediate Python

1

Introduction to Clustering

Before you are ready to classify news articles, you need to be introduced to the basics of clustering. This chapter familiarizes you with a class of machine learning algorithms called unsupervised learning and then introduces you to clustering, one of the popular unsupervised learning algorithms. You will know about two popular clustering techniques - hierarchical clustering and k-means clustering. The chapter concludes with basic pre-processing steps before you start clustering data.

Unsupervised learning: basics

Unsupervised learning in real world

Pokémon sightings

Basics of cluster analysis

Pokémon sightings: hierarchical clustering

Pokémon sightings: k-means clustering

Data preparation for cluster analysis

Normalize basic list data

Visualize normalized data

Normalization of small numbers

FIFA 18: Normalize data

2

Hierarchical Clustering

This chapter focuses on a popular clustering algorithm - hierarchical clustering - and its implementation in SciPy. In addition to the procedure to perform hierarchical clustering, it attempts to help you answer an important question - how many clusters are present in your data? The chapter concludes with a discussion on the limitations of hierarchical clustering and discusses considerations while using hierarchical clustering.

Basics of hierarchical clustering

Hierarchical clustering: ward method

Hierarchical clustering: single method

Hierarchical clustering: complete method

Visualize clusters

Visualize clusters with matplotlib

Visualize clusters with seaborn

How many clusters?

Create a dendrogram

How many clusters in comic con data?

Limitations of hierarchical clustering

Timing run of hierarchical clustering

FIFA 18: exploring defenders

3

K-Means Clustering

This chapter introduces a different clustering algorithm - k-means clustering - and its implementation in SciPy. K-means clustering overcomes the biggest drawback of hierarchical clustering that was discussed in the last chapter. As dendrograms are specific to hierarchical clustering, this chapter discusses one method to find the number of clusters before running k-means clustering. The chapter concludes with a discussion on the limitations of k-means clustering and discusses considerations while using this algorithm.

Basics of k-means clustering

K-means clustering: first exercise

Runtime of k-means clustering

How many clusters?

Elbow method on distinct clusters

Elbow method on uniform data

Limitations of k-means clustering

Impact of seeds on distinct clusters

Uniform clustering patterns

FIFA 18: defenders revisited

4

Clustering in Real World

Now that you are familiar with two of the most popular clustering techniques, this chapter helps you apply this knowledge to real-world problems. The chapter first discusses the process of finding dominant colors in an image, before moving on to the problem discussed in the introduction - clustering of news articles. The chapter concludes with a discussion on clustering with multiple variables, which makes it difficult to visualize all the data.

Dominant colors in images

Extract RGB values from image

How many dominant colors?

Display dominant colors

Document clustering

TF-IDF of movie plots

Top terms in movie clusters

Clustering with multiple features

Clustering with many features

Basic checks on clusters

FIFA 18: what makes a complete player?

Cluster Analysis in Python

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.8

from 959 reviews

82%

16%

1%

0%

0%

Sort by

Jhohan Alexis

4 days ago

Nilton Cesar

5 days ago

bien explicado :D

Anh

7 days ago

Thanh

last week

Hongyang

last week

Preeti

last week

Jhohan Alexis

"bien explicado :D"

Nilton Cesar

Anh

FAQs

Which Python library is used for clustering in this course?

The course primarily uses the SciPy library to implement both hierarchical and k-means clustering algorithms, along with standard tools for data visualization.

What dataset will I use to practice clustering?

You will explore player statistics from the FIFA 18 video game, applying clustering techniques to group players based on their performance attributes.

How will I determine the right number of clusters for my data?

For hierarchical clustering you will use dendrograms, and for k-means you will learn a separate method to evaluate the optimal number of clusters before running the algorithm.

Does the course cover real-world clustering applications beyond sports data?

Yes. The final chapter applies clustering to find dominant colors in images and to group news articles by topic, demonstrating practical uses in different domains.

What preprocessing steps are taught before clustering?

You will learn essential preprocessing steps like feature scaling and data normalization that are necessary before applying distance-based clustering algorithms effectively.

Join over 19 million learners and start Cluster Analysis in Python today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.