Loved by learners at thousands of companies
Dimensionality reduction techniques are based on unsupervised machine learning algorithms and their application offers several advantages. In this course you will learn how to apply dimensionality reduction techniques to exploit these advantages, using interesting datasets like the MNIST database of handwritten digits, the fashion version of MNIST released by Zalando, and a credit card fraud detection dataset. Firstly, you will have a look at t-SNE, an algorithm that performs non-linear dimensionality reduction. Then, you will also explore some useful characteristics of dimensionality reduction to apply in predictive models. Finally, you will see the application of GLRM to compress big data (with numerical and categorical values) and impute missing values. Are you ready to start compressing high dimensional data?
Introduction to Advanced Dimensionality ReductionFree
Are you ready to become a master of dimensionality reduction? In this chapter, you'll start by understanding how to represent handwritten digits using the MNIST dataset. You will learn what a distance metric is and which ones are the most common, along with the problems that arise with the curse of dimensionality. Finally, you will compare the application of PCA and t-SNE .
Introduction to t-SNE
Now, you will learn how to apply the t-Distributed Stochastic Neighbour Embedding (t-SNE) algorithm. After finishing this chapter, you will understand the different hyperparameters that have an impact on your results and how to optimize them. Finally, you will do something really cool: compute centroids prototypes of each digit to classify other digits.Building a t-SNE embedding50 xpComputing t-SNE100 xpUnderstanding t-SNE output100 xpOptimal number of t-SNE iterations50 xpReproducing results100 xpOptimal number of iterations100 xpEffect of perplexity parameter50 xpPerplexity of MNIST sample100 xpPerplexity of bigger MNIST dataset100 xpClassifying digits with t-SNE50 xpPlotting spatial distribution of true classes100 xpComputing the centroids of each class100 xpComputing similarities of digits 1 and 0100 xpPlotting similarities of digits 1 and 0100 xp
Using t-SNE with Predictive Models
In this chapter, you'll apply t-SNE to train predictive models faster. This is one of the many advantages of dimensionality reduction. You will learn how to train a random forest with the original features and with the embedded features and compare them. You will also apply t-SNE to understand the patterns learned by a neural network. And all of this using a real credit card fraud dataset!Credit card fraud detection50 xpExploring credit card fraud dataset100 xpGenerating training and test sets100 xpTraining random forests models50 xpTraining a random forest with original features100 xpComputing and visualising the t-SNE embedding100 xpTraining a random forest with embedding features100 xpPredicting data50 xpPredicting data using original features100 xpPredicting data using embedding random forest100 xpVisualizing neural networks layers50 xpExploring neural network layer output100 xpUsing t-SNE to visualise a neural network layer100 xp
Generalized Low Rank Models (GLRM)
In the final chapter, you will practice another useful dimensionality reduction algorithm: GLRM. Here you will make use of the Fashion MNIST data to classify clothes, impute missing data and also train random forests using the low dimensional embedding.Exploring fashion MNIST dataset50 xpExploring fashion MNIST100 xpVisualizing fashion MNIST100 xpGeneralized Low Rank Models (GLRM)50 xpReducing data with GLRM100 xpImproving model convergence100 xpVisualizing a GLRM model50 xpVisualizing the output of GLRM100 xpVisualizing the prototypes100 xpDealing with missing data and speeding-up models50 xpImputing missing data100 xpTraining a random forest with original data100 xpTraining a random forest with compressed data100 xpSummary of the course50 xp
PrerequisitesUnsupervised Learning in R
Data Scientist at DataRobot
Federico Castanedo is the Lead Telco Data Scientist at DataRobot. He is also an O'Reilly author on data science. Previously, he was the Lead Data Scientist at Vodafone Group and before that Chief Data Scientist/Co-founder at Wise Athena. He has published several scientific papers about data fusion techniques, visual sensor networks, and machine learning. He holds a Ph.D. in Artificial Intelligence from the University Carlos III of Madrid and has also been a visiting researcher at Stanford University.