- 13 Videos
- 52 Exercises
- 4 hours
- 1,808 Participants
- 4150 XP

**Instructor(s):**

Ben is a machine learning specialist and the director of research at lateral.io. He is passionate about learning and has worked as a data scientist in real-time bidding, e-commerce, and recommendation. Ben holds a PhD in mathematics and a degree in computer science.

Hugo Bowne-Anderson

Say you have a collection of customers with a variety of characteristics such as age, location, and financial history, and you wish to discover patterns and sort them into clusters. Or perhaps you have a set of texts, such as wikipedia pages, and you wish to segment them into categories based on their content. This is the world of unsupervised learning, called as such because you are not guiding, or supervising, the pattern discovery by some prediction task, but instead uncovering hidden structure from unlabeled data. Unsupervised learning encompasses a variety of techniques in machine learning, from clustering to dimension reduction to matrix factorization. In this course, you'll learn the fundamentals of unsupervised learning and implement the essential algorithms using scikit-learn and scipy. You will learn how to cluster, transform, visualize, and extract insights from unlabeled datasets, and end the course by building a recommender system to recommend popular musical artists.

- Intro to Python for Data Science
- Intermediate Python for Data Science
- Statistical Thinking in Python (Part 1)

Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.

- Unsupervised learning 50 xp
- How many clusters? 50 xp
- Clustering 2D points 100 xp
- Inspect your clustering 100 xp
- Evaluating a clustering 50 xp
- How many clusters of grain? 100 xp
- Evaluating the grain clustering 100 xp
- Transforming features for better clusterings 50 xp
- Scaling fish data for clustering 100 xp
- Clustering the fish data 100 xp
- Clustering stocks using KMeans 100 xp
- Which stocks move together? 100 xp

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!

In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!