Dimensionality Reduction in Python

Learn to reduce dimensionality in Python.
Start Course for Free
Clock4 HoursPlay16 VideosCode58 ExercisesGroup9,713 Learners
Database4700 XP

Create Your Free Account

Google LinkedInFacebook
or
By continuing you accept the Terms of Use and Privacy Policy. You also accept that you are aware that your data will be stored outside of the EU and that you are above the age of 16.

Loved by learners at thousands of companies


Course Description

High-dimensional datasets can be overwhelming and leave you not knowing where to start. Typically, you’d visually explore a new dataset first, but when you have too many dimensions the classical approaches will seem insufficient. Fortunately, there are visualization techniques designed specifically for high dimensional data and you’ll be introduced to these in this course. After exploring the data, you’ll often find that many features hold little information because they don’t show any variance or because they are duplicates of other features. You’ll learn how to detect these features and drop them from the dataset so that you can focus on the informative ones. In a next step, you might want to build a model on these features, and it may turn out that some don’t have any effect on the thing you’re trying to predict. You’ll learn how to detect and drop these irrelevant features too, in order to reduce dimensionality and thus complexity. Finally, you’ll learn how feature extraction techniques can reduce dimensionality for you through the calculation of uncorrelated principal components.

  1. 1

    Exploring high dimensional data

    Free
    You'll be introduced to the concept of dimensionality reduction and will learn when an why this is important. You'll learn the difference between feature selection and feature extraction and will apply both techniques for data exploration. The chapter ends with a lesson on t-SNE, a powerful feature extraction technique that will allow you to visualize a high-dimensional dataset.
    Play Chapter Now
  2. 2

    Feature selection I, selecting for feature information

    In this first out of two chapters on feature selection, you'll learn about the curse of dimensionality and how dimensionality reduction can help you overcome it. You'll be introduced to a number of techniques to detect and remove features that bring little added value to the dataset. Either because they have little variance, too many missing values, or because they are strongly correlated to other features.
    Play Chapter Now
  3. 3

    Feature selection II, selecting for model accuracy

    In this second chapter on feature selection, you'll learn how to let models help you find the most important features in a dataset for predicting a particular target feature. In the final lesson of this chapter, you'll combine the advice of multiple, different, models to decide on which features are worth keeping.
    Play Chapter Now
  4. 4

    Feature extraction

    This chapter is a deep-dive on the most frequently used dimensionality reduction algorithm, Principal Component Analysis (PCA). You'll build intuition on how and why this algorithm is so powerful and will apply it both for data exploration and data pre-processing in a modeling pipeline. You'll end with a cool image compression use case.
    Play Chapter Now
In the following tracks
Machine Learning for EveryoneMachine Learning Scientist
Collaborators
Chester IsmayHadrien LacroixHillary Green-Lerman
Jeroen Boeye Headshot

Jeroen Boeye

Machine Learning Engineer @ Faktion
Jeroen is a machine learning engineer working at Faktion, an AI company from Belgium. He uses both R and Python for his analyses and has a PhD background in computational biology. His experience mostly lies in working with structured data, produced by sensors or digital processes.
See More
Aleksandra Vercauteren Headshot

Aleksandra Vercauteren

Senior Data Scientist - Head of NLP @ Faktion
After 6 years as a researcher in Theoretical Linguistics, I decided to radically change careers and became a data scientist. I did not let go of my linguistic background and specialised in Natural Language Processing. Currently I am the head of NLP at Faktion, a leading Belgian NLP company. I love making sense of unstructured data and building solid solutions for automatic language processing.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA

Join over 6 million learners and start Dimensionality Reduction in Python today!

Create Your Free Account

Google LinkedInFacebook
or
By continuing you accept the Terms of Use and Privacy Policy. You also accept that you are aware that your data will be stored outside of the EU and that you are above the age of 16.