Interactive Course

Feature Engineering for Machine Learning in Python

Create new features to improve the performance of your Machine Learning models.

  • 4 hours
  • 16 Videos
  • 53 Exercises
  • 2,545 Participants
  • 4,350 XP

Loved by learners at thousands of top companies:

uber-grey.svg
ea-grey.svg
mls-grey.svg
credit-suisse-grey.svg
dell-grey.svg
roche-grey.svg

Course Description

Every day you read about the amazing breakthroughs in how the newest applications of machine learning are changing the world. Often this reporting glosses over the fact that a huge amount of data munging and feature engineering must be done before any of these fancy models can be used. In this course, you will learn how to do just that. You will work with Stack Overflow Developers survey, and historic US presidential inauguration addresses, to understand how best to preprocess and engineer features from categorical, continuous, and unstructured data. This course will give you hands-on experience on how to prepare any data for your own machine learning models.

  1. 1

    Creating Features

    Free

    In this chapter, you will explore what feature engineering is and how to get started with applying it to real-world data. You will load, explore and visualize a survey response dataset, and in doing so you will learn about its underlying data types and why they have an influence on how you should engineer your features. Using the pandas package you will create new features from both categorical and continuous columns.

  2. Conforming to Statistical Assumptions

    In this chapter, you will focus on analyzing the underlying distribution of your data and whether it will impact your machine learning pipeline. You will learn how to deal with skewed data and situations where outliers may be negatively impacting your analysis.

  3. Dealing with Messy Data

    This chapter introduces you to the reality of messy and incomplete data. You will learn how to find where your data has missing values and explore multiple approaches on how to deal with them. You will also use string manipulation techniques to deal with unwanted characters in your dataset.

  4. Dealing with Text Data

    Finally, in this chapter, you will work with unstructured text data, understanding ways in which you can engineer columnar features out of a text corpus. You will compare how different approaches may impact how much context is being extracted from a text, and how to balance the need for context, without too many features being created.

  1. 1

    Creating Features

    Free

    In this chapter, you will explore what feature engineering is and how to get started with applying it to real-world data. You will load, explore and visualize a survey response dataset, and in doing so you will learn about its underlying data types and why they have an influence on how you should engineer your features. Using the pandas package you will create new features from both categorical and continuous columns.

  2. Dealing with Messy Data

    This chapter introduces you to the reality of messy and incomplete data. You will learn how to find where your data has missing values and explore multiple approaches on how to deal with them. You will also use string manipulation techniques to deal with unwanted characters in your dataset.

  3. Conforming to Statistical Assumptions

    In this chapter, you will focus on analyzing the underlying distribution of your data and whether it will impact your machine learning pipeline. You will learn how to deal with skewed data and situations where outliers may be negatively impacting your analysis.

  4. Dealing with Text Data

    Finally, in this chapter, you will work with unstructured text data, understanding ways in which you can engineer columnar features out of a text corpus. You will compare how different approaches may impact how much context is being extracted from a text, and how to balance the need for context, without too many features being created.

What do other learners have to say?

Devon

“I've used other sites, but DataCamp's been the one that I've stuck with.”

Devon Edwards Joseph

Lloyd's Banking Group

Louis

“DataCamp is the top resource I recommend for learning data science.”

Louis Maiden

Harvard Business School

Ronbowers

“DataCamp is by far my favorite website to learn from.”

Ronald Bowers

Decision Science Analytics @ USAA

Robert O'Callaghan
Robert O'Callaghan

Director of Data Science, Ordergroove

Rob enables retailers and brands to make themselves indispensable to their customers’ lives by anticipating purchasing needs. Throughout his career, Rob has focused on the analysis, visualization, and modeling of data to produce actionable business improvements for some of the world’s largest organizations. He has successfully designed and implemented multi-million dollar machine learning solutions within several Fortune 500 companies, focusing in particular on bleeding edge unsupervised and supervised learning techniques. He has presented his work, in the U.S. and abroad, to audiences of hundreds at financial services and AI-focused conferences.

See More
Icon Icon Icon professional info