Introduction to Feature Engineering in R
Learn a variety of feature engineering techniques to develop meaningful features that will uncover useful insights about your machine learning models.
Start Course for Free4 Hours13 Videos44 Exercises4,144 Learners3500 XP
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.Loved by learners at thousands of companies
Course Description
Feature engineering helps you uncover useful insights from your machine learning models. The model building process is iterative and requires creating new features using existing variables that make your model more efficient. In this course, you will explore different data sets and apply a variety of feature engineering techniques to both continuous and discrete variables.
- 1
Creating Features from Categorical Data
FreeIn this chapter, you will learn how to change categorical features into numerical representations that models can interpret. You'll learn about one-hot encoding and using binning for categorical features.
Introduction to feature engineering in R50 xpExamples of feature engineering50 xpOne-hot encoding100 xpBinning encoding: content driven50 xpLeveraging content knowledge100 xpConverting new categories to numeric100 xpBinning encoding: data driven50 xpCategorical proportions by outcome100 xpReducing categories using outcome100 xp - 2
Creating Features from Numeric Data
In this chapter, you will learn how to manipulate numerical features to create meaningful features that can give better insights into your model. You will also learn how to work with dates in the context of feature engineering.
Numerical bucketing or binning50 xpVisualizing the distribution100 xpCreating uniform buckets from a distribution100 xpBinning numerical data using quantiles50 xpBalanced bucketing100 xpFull matrix encoding100 xpUnique attributes of adaptive bucketing50 xpDate and time feature extraction50 xpConverting string types to date types100 xpConverting dates100 xpVisualize time features100 xp - 3
Transforming Numerical Features
In this chapter, you will learn about using transformation techniques, like Box-Cox and Yeo-Johnson, to address issues with non-normally distributed features. You'll also learn about methods to scale features, including mean centering and z-score standardization.
Box and Yeo transformations50 xpBox-Cox vs. Yeo-Johnson50 xpBox-Cox transformations100 xpYeo-Johnson transformations100 xpNormalization techniques50 xpScaling100 xpMean centering100 xpCaret mean centering100 xpZ-score standardization50 xpStandardization one variable case100 xpCaret standardization100 xp - 4
Advanced Methods
In the final chapter, we will use feature crossing to create features from two or more variables. We will also discuss principal component analysis, and methods to explore and visualize those results.
Feature crossing50 xpHow many features to expect50 xpExploring features visually100 xpExploring potential crosses100 xpCrossing two categorical features100 xpPrincipal component analysis50 xpConduct PCA100 xpPCA results50 xpInterpreting PCA output50 xpProportion of variance by PCA100 xpVisualizing results with a scree plot100 xpVisualizing components100 xpWrap-up50 xp
Collaborators


Prerequisites
Exploratory Data Analysis in R
Jose Hernandez
Data Scientist, University of Washington
Jose is a Data Scientist at the University of Washington’s eScience Institute. Jose’s interests include the application of data science methods on sociological and educational data and building open source data tools to facilitate that process. Jose’s research combines theory and practice with data science methods to inform education policymaking. Jose earned his doctorate at the UW, with a focus in statistics and measurement and a Master of Education in policy, also from UW.
What do other learners have to say?
Join over 11 million learners and start Introduction to Feature Engineering in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.