课程

Python 中的无监督学习

中级技能水平

更新时间 2025年12月

学习如何使用 scikit-learn 和 scipy 对无标签数据集进行聚类、转换、可视化并提取洞察。

免费开始课程

PythonMachine Learning

4小时

13 视频

52 道练习

4,150 XP

170K+

成就证明

深受数千家公司学习者的喜爱

需要团队培训？

企业版试用

课程描述

设想您手里有一组客户数据，包含年龄、所在地、财务历史等多种特征，您希望找出其中的模式并将他们分成簇。或者，您有一组文本（如 Wikipedia 页面），希望根据内容将其划分为不同类别。这就是无监督学习：之所以称为"无监督"，是因为您并未通过某个预测任务来引导或监督模式发现，而是从未标注的数据中挖掘潜在结构。无监督学习涵盖机器学习中的多种技术，从聚类到降维再到矩阵分解。在本课程中，您将学习无监督学习的基础，并使用 scikit-learn 和 SciPy 实现核心算法。您将学会如何对未标注的数据集进行聚类、变换、可视化与洞察提取，并在课程最后构建一个推荐系统，为用户推荐受欢迎的音乐艺术家。视频带有实时字幕，您可以点击视频左下角的 "Show transcript" 展开查看。课程术语表位于右侧的资源部分。若要获得 CPE 学分，您需要完成课程并在合格评估中达到 70% 的得分。您可以点击右侧的 CPE 学分提示进入评估。

先决条件

Supervised Learning with scikit-learn

1

Clustering for Dataset Exploration

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

Unsupervised Learning

How many clusters?

Clustering 2D points

Inspect your clustering

Evaluating a clustering

How many clusters of grain?

Evaluating the grain clustering

Transforming features for better clusterings

Scaling fish data for clustering

Clustering the fish data

Clustering stocks using KMeans

Which stocks move together?

2

Visualization with Hierarchical Clustering and t-SNE

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Visualizing hierarchies

How many merges?

Hierarchical clustering of the grain data

Hierarchies of stocks

Cluster labels in hierarchical clustering

Which clusters are closest?

Different linkage, different hierarchical clustering!

Intermediate clusterings

Extracting the cluster labels

t-SNE for 2-dimensional maps

t-SNE visualization of grain dataset

A t-SNE map of the stock market

3

Decorrelating Your Data and Dimension Reduction

Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

Visualizing the PCA transformation

Correlated data in nature

Decorrelating the grain measurements with PCA

Principal components

Intrinsic dimension

The first principal component

Variance of the PCA features

Intrinsic dimension of the fish data

Dimension reduction with PCA

Dimension reduction of the fish measurements

A tf-idf word-frequency array

Clustering Wikipedia part I

Clustering Wikipedia part II

4

Discovering Interpretable Features

In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!

Non-negative matrix factorization (NMF)

Non-negative data

NMF applied to Wikipedia articles

NMF features of the Wikipedia articles

NMF reconstructs samples

NMF learns interpretable parts

NMF learns topics of documents

Explore the LED digits dataset

NMF learns the parts of images

PCA doesn't learn parts

Building recommender systems using NMF

Which articles are similar to 'Cristiano Ronaldo'?

Recommend musical artists part I

Recommend musical artists part II

Final thoughts

Python 中的无监督学习

课程完成

获得成就证明

将此证书添加到您的 LinkedIn 档案、简历或履历中
在社交媒体和绩效评估中分享立即注册

加入超过19百万学习者，今天就开始Python 中的无监督学习！

通过 DataCamp for Mobile 提升您的数据技能

随时随地通过我们的移动课程和每日 5 分钟编程挑战提升技能。