Dimensionality Reduction in Python
Understand the concept of reducing dimensionality in your data, and master the techniques to do so in Python.
Comece O Curso Gratuitamente4 horas16 vídeos58 exercícios
Crie sua conta gratuita
ou
Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.Treinar 2 ou mais pessoas?Experimente o DataCamp For Business
Amado por alunos de milhares de empresas
Descrição do Curso
High-dimensional datasets can be overwhelming and leave you not knowing where to start. Typically, you’d visually explore a new dataset first, but when you have too many dimensions the classical approaches will seem insufficient. Fortunately, there are visualization techniques designed specifically for high dimensional data and you’ll be introduced to these in this course. After exploring the data, you’ll often find that many features hold little information because they don’t show any variance or because they are duplicates of other features. You’ll learn how to detect these features and drop them from the dataset so that you can focus on the informative ones. In a next step, you might want to build a model on these features, and it may turn out that some don’t have any effect on the thing you’re trying to predict. You’ll learn how to detect and drop these irrelevant features too, in order to reduce dimensionality and thus complexity. Finally, you’ll learn how feature extraction techniques can reduce dimensionality for you through the calculation of uncorrelated principal components.
Para Empresas
Treinar 2 ou mais pessoas?
Obtenha acesso à biblioteca completa do DataCamp, com relatórios, atribuições, projetos e muito mais centralizadosNas seguintes faixas
Cientista de aprendizado de máquina em Python
Ir para a trilha- 1
Exploring High Dimensional Data
GratuitoYou'll be introduced to the concept of dimensionality reduction and will learn when an why this is important. You'll learn the difference between feature selection and feature extraction and will apply both techniques for data exploration. The chapter ends with a lesson on t-SNE, a powerful feature extraction technique that will allow you to visualize a high-dimensional dataset.
Introduction50 xpFinding the number of dimensions in a dataset50 xpRemoving features without variance100 xpFeature selection vs. feature extraction50 xpVisually detecting redundant features100 xpAdvantage of feature selection50 xpt-SNE visualization of high-dimensional data50 xpt-SNE intuition50 xpFitting t-SNE to the ANSUR data100 xpt-SNE visualisation of dimensionality100 xp - 2
Feature Selection I - Selecting for Feature Information
In this first out of two chapters on feature selection, you'll learn about the curse of dimensionality and how dimensionality reduction can help you overcome it. You'll be introduced to a number of techniques to detect and remove features that bring little added value to the dataset. Either because they have little variance, too many missing values, or because they are strongly correlated to other features.
The curse of dimensionality50 xpTrain - test split100 xpFitting and testing the model100 xpAccuracy after dimensionality reduction100 xpFeatures with missing values or little variance50 xpFinding a good variance threshold100 xpFeatures with low variance100 xpRemoving features with many missing values100 xpPairwise correlation50 xpCorrelation intuition50 xpInspecting the correlation matrix50 xpVisualizing the correlation matrix100 xpRemoving highly correlated features50 xpFiltering out highly correlated features100 xpNuclear energy and pool drownings100 xp - 3
Feature Selection II - Selecting for Model Accuracy
In this second chapter on feature selection, you'll learn how to let models help you find the most important features in a dataset for predicting a particular target feature. In the final lesson of this chapter, you'll combine the advice of multiple, different, models to decide on which features are worth keeping.
Selecting features for model performance50 xpBuilding a diabetes classifier100 xpManual Recursive Feature Elimination100 xpAutomatic Recursive Feature Elimination100 xpTree-based feature selection50 xpBuilding a random forest model100 xpRandom forest for feature selection100 xpRecursive Feature Elimination with random forests100 xpRegularized linear regression50 xpCreating a LASSO regressor100 xpLasso model results100 xpAdjusting the regularization strength100 xpCombining feature selectors50 xpCreating a LassoCV regressor100 xpEnsemble models for extra votes100 xpCombining 3 feature selectors100 xp - 4
Feature Extraction
This chapter is a deep-dive on the most frequently used dimensionality reduction algorithm, Principal Component Analysis (PCA). You'll build intuition on how and why this algorithm is so powerful and will apply it both for data exploration and data pre-processing in a modeling pipeline. You'll end with a cool image compression use case.
Feature extraction50 xpManual feature extraction I100 xpManual feature extraction II100 xpPrincipal component intuition50 xpPrincipal component analysis50 xpCalculating Principal Components100 xpPCA on a larger dataset100 xpPCA explained variance100 xpPCA applications50 xpUnderstanding the components100 xpPCA for feature exploration100 xpPCA in a model pipeline100 xpPrincipal Component selection50 xpSelecting the proportion of variance to keep100 xpChoosing the number of components100 xpPCA for image compression100 xpCongratulations!50 xp
Para Empresas
Treinar 2 ou mais pessoas?
Obtenha acesso à biblioteca completa do DataCamp, com relatórios, atribuições, projetos e muito mais centralizadosNas seguintes faixas
Cientista de aprendizado de máquina em Python
Ir para a trilhacolaboradores
pré-requisitos
Supervised Learning with scikit-learnJeroen Boeye
Ver MaisMachine Learning Engineer @ Faktion
O que os outros alunos têm a dizer?
Junte-se a mais de 14 milhões de alunos e comece Dimensionality Reduction in Python hoje mesmo!
Crie sua conta gratuita
ou
Ao continuar, você aceita nossos Termos de Uso, nossa Política de Privacidade e que seus dados são armazenados nos EUA.