Home RDimensionality Reduction in R

Dimensionality Reduction in R

Learn dimensionality reduction techniques in R and master feature selection and extraction for your own data and models.

Start Course for Free

4 Hours16 Videos56 Exercises

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

Do you ever work with datasets with an overwhelming number of features? Do you need all those features? Which ones are the most important? In this course, you will learn dimensionality reduction techniques that will help you simplify your data and the models that you build with your data while maintaining the information in the original data and good predictive performance.

Why learn dimensionality reduction?

We live in the information age—an era of information overload. The art of extracting essential information from data is a marketable skill. Models train faster on reduced data. In production, smaller models mean faster response time. Perhaps most important, smaller data and models are often easier to understand. Dimensionality reduction is your Occam’s razor in data science.

What will you learn in this course?

The difference between feature selection and feature extraction! Using R, you will learn how to identify and remove features with low or redundant information, keeping the features with the most information. That’s feature selection. You will also learn how to extract combinations of features as condensed components that contain maximal information. That’s feature extraction!

But most importantly, using R’s new tidymodel package, you will use real-world data to build models with fewer features without sacrificing significant performance.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
Foundations of Dimensionality Reduction
Free
Prepare to simplify large data sets! You will learn about information, how to assess feature importance, and practice identifying low-information features. By the end of the chapter, you will understand the difference between feature selection and feature extraction—the two approaches to dimensionality reduction.
Play Chapter Now
Introduction to dimensionality reduction
50 xp
Dimensionality and feature information
100 xp
Mutual information features
100 xp
Information and feature importance
50 xp
Calculating root entropy
100 xp
Calculating child entropies
100 xp
Calculating information gain of color
100 xp
The Importance of Dimensionality Reduction in Data and Model Building
50 xp
Calculate possible combinations
100 xp
Curse of dimensionality, overfitting, and bias
100 xp
2
Feature Selection for Feature Importance
Learn how to identify information-rich and information-poor features missing value ratios, variance, and correlation. Then you'll discover how to build tidymodel recipes to select features using these information indicators.
Play Chapter Now
Feature selection vs. feature extraction
50 xp
Create a zero-variance filter
100 xp
Create a missing values filter
100 xp
Feature selection with the combined filter
100 xp
Selecting based on missing values
50 xp
Create a missing value ratio filter
100 xp
Apply a missing value ratio filter
100 xp
Create a missing values recipe
100 xp
Selecting based on variance
50 xp
Create a low-variance filter
100 xp
Create a low-variance recipe
100 xp
Selecting based on correlation with other features
50 xp
Identify highly correlated features
100 xp
Select correlated feature to remove
50 xp
Create a high-correlation recipe
100 xp
3
Feature Selection for Model Performance
Chapter three introduces the difference between unsupervised and supervised feature selection approaches. You'll review how to use tidymodels workflows to build models. Then, you'll perform supervised feature selection using lasso regression and random forest models.
Play Chapter Now
Supervised feature selection
50 xp
Supervised vs. unsupervised feature selection
100 xp
Decision tree feature selection type
50 xp
Model Building and Evaluation with tidymodels
50 xp
Split out the train and test sets
100 xp
Create a recipe-model workflow
100 xp
Fit, explore, and evaluate the model
100 xp
Lasso Regression
50 xp
Scale the data for lasso regression
100 xp
Explore lasso regression penalty values
100 xp
Tune the penalty hyperparameter
100 xp
Fit the best model
100 xp
Random forest models
50 xp
Create full random forest model
100 xp
Reduce data using feature importances
100 xp
Create reduced random forest
100 xp
4
Feature Extraction and Model Performance
In this final chapter, you'll gain a strong intuition of feature extraction by understanding how principal components extract and combine the most important information from different features. Then learn about and apply three types of feature extraction — principal component analysis (PCA), t-SNE, and UMAP. Discover how you can use these feature extraction methods as a preprocessing step in the tidymodels model-building process.
Play Chapter Now
Foundations of feature extraction - principal components
50 xp
Understanding principal components
100 xp
Naming principal components
50 xp
Principal Component Analysis (PCA)
50 xp
PCA: variance explained
50 xp
Mapping features to principal components
100 xp
PCA in tidymodels
100 xp
t-Distributed Stochastic Neighborhood Embedding (t-SNE)
50 xp
Separating house prices with PCA
100 xp
Separating house prices with t-SNE
100 xp
Uniform Manifold Approximation and Projection (UMAP)
50 xp
Separating house prices with UMAP
100 xp
UMAP reduction in a decision tree model
100 xp
Evaluate the UMAP decision tree model
100 xp
Wrap up
50 xp

In the following tracks

Machine Learning Scientist with R

Collaborators

George Boorman

Jasmin Ludolf

Izzy Weber

Prerequisites

Modeling with tidymodels in R

Matt Pickard

Owner, Pickard Predictives, LLC

Matt is an Associate Professor of Data and Analytics at Northern Illinois University. On the side, he does data analytics consulting and training as the owner of Pickard Predictives, LLC. He's happily married with four girls and a boy poodle.

What do other learners have to say?

Join over 13 million learners and start Dimensionality Reduction in R today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

Why learn dimensionality reduction?

What will you learn in this course?

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

Foundations of Dimensionality Reduction

Feature Selection for Feature Importance

Feature Selection for Model Performance

Feature Extraction and Model Performance

What do other learners have to say?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Dimensionality Reduction in R today!

Create Your Free Account

Training 2 or more people?

Join over 13 million learners and start Dimensionality Reduction in R today!