Hoppa till huvudinnehåll

course

Dimensionality Reduction in R

GrundläggandeFärdighetsnivå

Uppdaterad 2024-12

Learn dimensionality reduction techniques in R and master feature selection and extraction for your own data and models.

Börja Kursen Gratis

RMachine Learning

4 timmar

16 videos

56 exercises

4,600 XP

2,717

Uttalande om prestation

Älskad av elever på tusentals företag

Training a Team?

Try for Business

Kursbeskrivning

Do you ever work with datasets with an overwhelming number of features? Do you need all those features? Which ones are the most important? In this course, you will learn dimensionality reduction techniques that will help you simplify your data and the models that you build with your data while maintaining the information in the original data and good predictive performance.

Why learn dimensionality reduction?

We live in the information age—an era of information overload. The art of extracting essential information from data is a marketable skill. Models train faster on reduced data. In production, smaller models mean faster response time. Perhaps most important, smaller data and models are often easier to understand. Dimensionality reduction is your Occam’s razor in data science.

What will you learn in this course?

The difference between feature selection and feature extraction! Using R, you will learn how to identify and remove features with low or redundant information, keeping the features with the most information. That’s feature selection. You will also learn how to extract combinations of features as condensed components that contain maximal information. That’s feature extraction!

But most importantly, using R’s new tidymodel package, you will use real-world data to build models with fewer features without sacrificing significant performance.

Förkunskapskrav

Modeling with tidymodels in R

1

Foundations of Dimensionality Reduction

Prepare to simplify large data sets! You will learn about information, how to assess feature importance, and practice identifying low-information features. By the end of the chapter, you will understand the difference between feature selection and feature extraction—the two approaches to dimensionality reduction.

Introduction to dimensionality reduction

Dimensionality and feature information

Mutual information features

Information and feature importance

Calculating root entropy

Calculating child entropies

Calculating information gain of color

The Importance of Dimensionality Reduction in Data and Model Building

Calculate possible combinations

Curse of dimensionality, overfitting, and bias

2

Feature Selection for Feature Importance

Learn how to identify information-rich and information-poor features missing value ratios, variance, and correlation. Then you'll discover how to build tidymodel recipes to select features using these information indicators.

Feature selection vs. feature extraction

Create a zero-variance filter

Create a missing values filter

Feature selection with the combined filter

Selecting based on missing values

Create a missing value ratio filter

Apply a missing value ratio filter

Create a missing values recipe

Selecting based on variance

Create a low-variance filter

Create a low-variance recipe

Selecting based on correlation with other features

Identify highly correlated features

Select correlated feature to remove

Create a high-correlation recipe

3

Feature Selection for Model Performance

Chapter three introduces the difference between unsupervised and supervised feature selection approaches. You'll review how to use tidymodels workflows to build models. Then, you'll perform supervised feature selection using lasso regression and random forest models.

Supervised feature selection

Supervised vs. unsupervised feature selection

Decision tree feature selection type

Model Building and Evaluation with tidymodels

Split out the train and test sets

Create a recipe-model workflow

Fit, explore, and evaluate the model

Lasso Regression

Scale the data for lasso regression

Explore lasso regression penalty values

Tune the penalty hyperparameter

Fit the best model

Random forest models

Create full random forest model

Reduce data using feature importances

Create reduced random forest

4

Feature Extraction and Model Performance

In this final chapter, you'll gain a strong intuition of feature extraction by understanding how principal components extract and combine the most important information from different features. Then learn about and apply three types of feature extraction — principal component analysis (PCA), t-SNE, and UMAP. Discover how you can use these feature extraction methods as a preprocessing step in the tidymodels model-building process.

Foundations of feature extraction - principal components

Understanding principal components

Naming principal components

Principal Component Analysis (PCA)

PCA: variance explained

Mapping features to principal components

PCA in tidymodels

t-Distributed Stochastic Neighborhood Embedding (t-SNE)

Separating house prices with PCA

Separating house prices with t-SNE

Uniform Manifold Approximation and Projection (UMAP)

Separating house prices with UMAP

UMAP reduction in a decision tree model

Evaluate the UMAP decision tree model

Dimensionality Reduction in R

Kursen
är

Få ett prestationsutlåtande

Lägg till denna inloggningsuppgifter i din LinkedIn-profil, ditt CV eller ditt CV
Dela det på sociala medier och i ditt prestationssamtalRegistrera Dig Nu

Gå med över 19 miljoner elever och börja Dimensionality Reduction in R idag!

Utveckla dina datakunskaper med DataCamp för mobilen

Gör framsteg när du är på språng med våra mobila kurser och dagliga 5-minuters kodningsutmaningar.