Interactive Course

Feature Engineering with PySpark

Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.

  • 4 hours
  • 16 Videos
  • 60 Exercises
  • 3,660 Participants
  • 5,000 XP

Loved by learners at thousands of top companies:

rei-grey.svg
axa-grey.svg
ikea-grey.svg
intel-grey.svg
t-mobile-grey.svg
mls-grey.svg

Course Description

The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!

  1. Wrangling with Spark Functions

    Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.

  2. Building a Model

    In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!

  1. 1

    Exploratory Data Analysis

    Free

    Get to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!

  2. Wrangling with Spark Functions

    Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.

  3. Feature Engineering

    In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.

  4. Building a Model

    In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!

What do other learners have to say?

Devon

“I've used other sites, but DataCamp's been the one that I've stuck with.”

Devon Edwards Joseph

Lloyd's Banking Group

Louis

“DataCamp is the top resource I recommend for learning data science.”

Louis Maiden

Harvard Business School

Ronbowers

“DataCamp is by far my favorite website to learn from.”

Ronald Bowers

Decision Science Analytics @ USAA

John Hogue
John Hogue

Lead Data Scientist, General Mills

I have a strong drive for innovation and giving back. Through my work I enjoy building out a career path and center of excellence for those in data science at General Mills. I have a passion for taking action and challenging the status quo with fact based analysis to drive results. Outside of work I enjoy running an organization that gives aspiring and practicing data scientists opportunities to show case their skills in a meaningful way.

See More
Icon Icon Icon professional info