Sara Bengoechea Rodríguez has completed

Advanced Dimensionality Reduction in R

4 hr

4,300 XP

Loved by learners at thousands of companies

Course Description

Dimensionality reduction techniques are based on unsupervised machine learning algorithms and their application offers several advantages. In this course you will learn how to apply dimensionality reduction techniques to exploit these advantages, using interesting datasets like the MNIST database of handwritten digits, the fashion version of MNIST released by Zalando, and a credit card fraud detection dataset. Firstly, you will have a look at t-SNE, an algorithm that performs non-linear dimensionality reduction. Then, you will also explore some useful characteristics of dimensionality reduction to apply in predictive models. Finally, you will see the application of GLRM to compress big data (with numerical and categorical values) and impute missing values. Are you ready to start compressing high dimensional data?

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

1
Introduction to Advanced Dimensionality Reduction
Free
Are you ready to become a master of dimensionality reduction? In this chapter, you'll start by understanding how to represent handwritten digits using the MNIST dataset. You will learn what a distance metric is and which ones are the most common, along with the problems that arise with the curse of dimensionality. Finally, you will compare the application of PCA and t-SNE .
Play Chapter Now
Exploring the MNIST dataset
50 xp
Exploring MNIST dataset
100 xp
Digits features
100 xp
Distance metrics
50 xp
Euclidean distance
100 xp
Minkowski distance
100 xp
KL divergence
100 xp
PCA and t-SNE
50 xp
Generating PCA from MNIST sample
100 xp
t-SNE output from MNIST sample
100 xp
2
Introduction to t-SNE
Now, you will learn how to apply the t-Distributed Stochastic Neighbour Embedding (t-SNE) algorithm. After finishing this chapter, you will understand the different hyperparameters that have an impact on your results and how to optimize them. Finally, you will do something really cool: compute centroids prototypes of each digit to classify other digits.
Play Chapter Now
Building a t-SNE embedding
50 xp
Computing t-SNE
100 xp
Understanding t-SNE output
100 xp
Optimal number of t-SNE iterations
50 xp
Reproducing results
100 xp
Optimal number of iterations
100 xp
Effect of perplexity parameter
50 xp
Perplexity of MNIST sample
100 xp
Perplexity of bigger MNIST dataset
100 xp
Classifying digits with t-SNE
50 xp
Plotting spatial distribution of true classes
100 xp
Computing the centroids of each class
100 xp
Computing similarities of digits 1 and 0
100 xp
Plotting similarities of digits 1 and 0
100 xp
3
Using t-SNE with Predictive Models
In this chapter, you'll apply t-SNE to train predictive models faster. This is one of the many advantages of dimensionality reduction. You will learn how to train a random forest with the original features and with the embedded features and compare them. You will also apply t-SNE to understand the patterns learned by a neural network. And all of this using a real credit card fraud dataset!
Play Chapter Now
Credit card fraud detection
50 xp
Exploring credit card fraud dataset
100 xp
Generating training and test sets
100 xp
Training random forests models
50 xp
Training a random forest with original features
100 xp
Computing and visualising the t-SNE embedding
100 xp
Training a random forest with embedding features
100 xp
Predicting data
50 xp
Predicting data using original features
100 xp
Predicting data using embedding random forest
100 xp
Visualizing neural networks layers
50 xp
Exploring neural network layer output
100 xp
Using t-SNE to visualise a neural network layer
100 xp
4
Generalized Low Rank Models (GLRM)
In the final chapter, you will practice another useful dimensionality reduction algorithm: GLRM. Here you will make use of the Fashion MNIST data to classify clothes, impute missing data and also train random forests using the low dimensional embedding.
Play Chapter Now
Exploring fashion MNIST dataset
50 xp
Exploring fashion MNIST
100 xp
Visualizing fashion MNIST
100 xp
Generalized Low Rank Models (GLRM)
50 xp
Reducing data with GLRM
100 xp
Improving model convergence
100 xp
Visualizing a GLRM model
50 xp
Visualizing the output of GLRM
100 xp
Visualizing the prototypes
100 xp
Dealing with missing data and speeding-up models
50 xp
Imputing missing data
100 xp
Training a random forest with original data
100 xp
Training a random forest with compressed data
100 xp
Summary of the course
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

datasets

MNIST sample Credit card fraud Fashion MNIST sample

collaborators

Chester Ismay

Sara Billen

prerequisites

Unsupervised Learning in R

Federico Castanedo

Data Scientist at DataRobot

Join over 18 million learners and start Advanced Dimensionality Reduction in R today!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Advanced Dimensionality Reduction in R

Loved by learners at thousands of companies

Course Description

.css-10r9e5n{-webkit-margin-end:8px;margin-inline-end:8px;}.css-1309hh9{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;-webkit-margin-end:8px;margin-inline-end:8px;}Training 2 or more people?

Introduction to Advanced Dimensionality Reduction

Introduction to t-SNE

Using t-SNE with Predictive Models

Generalized Low Rank Models (GLRM)

Training 2 or more people?

Join over .css-ou6dz6{color:#03ef62;}18 million learners and start Advanced Dimensionality Reduction in R today!

Create Your Free Account

Training 2 or more people?

Join over 18 million learners and start Advanced Dimensionality Reduction in R today!