Loved by learners at thousands of companies
Say you have a collection of customers with a variety of characteristics such as age, location, and financial history, and you wish to discover patterns and sort them into clusters. Or perhaps you have a set of texts, such as Wikipedia pages, and you wish to segment them into categories based on their content. This is the world of unsupervised learning, called as such because you are not guiding, or supervising, the pattern discovery by some prediction task, but instead uncovering hidden structure from unlabeled data. Unsupervised learning encompasses a variety of techniques in machine learning, from clustering to dimension reduction to matrix factorization. In this course, you'll learn the fundamentals of unsupervised learning and implement the essential algorithms using scikit-learn and SciPy. You will learn how to cluster, transform, visualize, and extract insights from unlabeled datasets, and end the course by building a recommender system to recommend popular musical artists.
Clustering for dataset explorationFree
Learn how to discover the underlying groups (or "clusters") in a dataset. By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.Unsupervised Learning50 xpHow many clusters?50 xpClustering 2D points100 xpInspect your clustering100 xpEvaluating a clustering50 xpHow many clusters of grain?100 xpEvaluating the grain clustering100 xpTransforming features for better clusterings50 xpScaling fish data for clustering100 xpClustering the fish data100 xpClustering stocks using KMeans100 xpWhich stocks move together?100 xp
Visualization with hierarchical clustering and t-SNE
In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.Visualizing hierarchies50 xpHow many merges?50 xpHierarchical clustering of the grain data100 xpHierarchies of stocks100 xpCluster labels in hierarchical clustering50 xpWhich clusters are closest?50 xpDifferent linkage, different hierarchical clustering!100 xpIntermediate clusterings50 xpExtracting the cluster labels100 xpt-SNE for 2-dimensional maps50 xpt-SNE visualization of grain dataset100 xpA t-SNE map of the stock market100 xp
Decorrelating your data and dimension reduction
Dimension reduction summarizes a dataset using its common occuring patterns. In this chapter, you'll learn about the most fundamental of dimension reduction techniques, "Principal Component Analysis" ("PCA"). PCA is often used before supervised learning to improve model performance and generalization. It can also be useful for unsupervised learning. For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!Visualizing the PCA transformation50 xpCorrelated data in nature100 xpDecorrelating the grain measurements with PCA100 xpPrincipal components50 xpIntrinsic dimension50 xpThe first principal component100 xpVariance of the PCA features100 xpIntrinsic dimension of the fish data50 xpDimension reduction with PCA50 xpDimension reduction of the fish measurements100 xpA tf-idf word-frequency array100 xpClustering Wikipedia part I100 xpClustering Wikipedia part II100 xp
Discovering interpretable features
In this chapter, you'll learn about a dimension reduction technique called "Non-negative matrix factorization" ("NMF") that expresses samples as combinations of interpretable parts. For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns. You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!Non-negative matrix factorization (NMF)50 xpNon-negative data50 xpNMF applied to Wikipedia articles100 xpNMF features of the Wikipedia articles100 xpNMF reconstructs samples50 xpNMF learns interpretable parts50 xpNMF learns topics of documents100 xpExplore the LED digits dataset100 xpNMF learns the parts of images100 xpPCA doesn't learn parts100 xpBuilding recommender systems using NMF50 xpWhich articles are similar to 'Cristiano Ronaldo'?100 xpRecommend musical artists part I100 xpRecommend musical artists part II100 xpFinal thoughts50 xp
DatasetsCompany stock price movementsEurovision 2016Fish measurementsGrainsLCD digitsMusical artistsWikipedia articlesWine
PrerequisitesStatistical Thinking in Python (Part 1)
Director of Research at lateral.io
Ben is a machine learning specialist and the director of research at lateral.io. He is passionate about learning and has worked as a data scientist in real-time bidding, e-commerce, and recommendation. Ben holds a PhD in mathematics and a degree in computer science.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA