Feature Engineering for Machine Learning in Python
Create new features to improve the performance of your Machine Learning models.
Kurs Kostenlos Starten4 Stunden16 Videos53 Übungen31.173 LernendeLeistungsnachweis
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.Trainierst du 2 oder mehr?
Versuchen DataCamp for BusinessBeliebt bei Lernenden in Tausenden Unternehmen
Kursbeschreibung
Every day you read about the amazing breakthroughs in how the newest applications of machine learning are changing the world. Often this reporting glosses over the fact that a huge amount of data munging and feature engineering must be done before any of these fancy models can be used. In this course, you will learn how to do just that. You will work with Stack Overflow Developers survey, and historic US presidential inauguration addresses, to understand how best to preprocess and engineer features from categorical, continuous, and unstructured data. This course will give you hands-on experience on how to prepare any data for your own machine learning models.
Für Unternehmen
Trainierst du 2 oder mehr?
Erhalten Sie für Ihr Team Zugriff auf die vollständige DataCamp-Bibliothek mit zentralisierten Berichten, Zuweisungen, Projekten und mehrIn den folgenden Tracks
Machine Learning Scientist mit Python
Gehe zu Track- 1
Creating Features
KostenlosIn this chapter, you will explore what feature engineering is and how to get started with applying it to real-world data. You will load, explore and visualize a survey response dataset, and in doing so you will learn about its underlying data types and why they have an influence on how you should engineer your features. Using the pandas package you will create new features from both categorical and continuous columns.
- 2
Dealing with Messy Data
This chapter introduces you to the reality of messy and incomplete data. You will learn how to find where your data has missing values and explore multiple approaches on how to deal with them. You will also use string manipulation techniques to deal with unwanted characters in your dataset.
Why do missing values exist?50 xpHow sparse is my data?100 xpFinding the missing values100 xpDealing with missing values (I)50 xpListwise deletion100 xpReplacing missing values with constants100 xpDealing with missing values (II)50 xpFilling continuous missing values100 xpImputing values in predictive models50 xpDealing with other data issues50 xpDealing with stray characters (I)100 xpDealing with stray characters (II)100 xpMethod chaining100 xp - 3
Conforming to Statistical Assumptions
In this chapter, you will focus on analyzing the underlying distribution of your data and whether it will impact your machine learning pipeline. You will learn how to deal with skewed data and situations where outliers may be negatively impacting your analysis.
Data distributions50 xpWhat does your data look like? (I)100 xpWhat does your data look like? (II)100 xpWhen don't you have to transform your data?50 xpScaling and transformations50 xpNormalization100 xpStandardization100 xpLog transformation100 xpWhen can you use normalization?50 xpRemoving outliers50 xpPercentage based outlier removal100 xpStatistical outlier removal100 xpScaling and transforming new data50 xpTrain and testing transformations (I)100 xpTrain and testing transformations (II)100 xp - 4
Dealing with Text Data
Finally, in this chapter, you will work with unstructured text data, understanding ways in which you can engineer columnar features out of a text corpus. You will compare how different approaches may impact how much context is being extracted from a text, and how to balance the need for context, without too many features being created.
Encoding text50 xpCleaning up your text100 xpHigh level text features100 xpWord counts50 xpCounting words (I)100 xpCounting words (II)100 xpLimiting your features100 xpText to DataFrame100 xpTerm frequency-inverse document frequency50 xpTf-idf100 xpInspecting Tf-idf values100 xpTransforming unseen data100 xpN-grams50 xpUsing longer n-grams100 xpFinding the most common words100 xpWrap-up50 xp
Für Unternehmen
Trainierst du 2 oder mehr?
Erhalten Sie für Ihr Team Zugriff auf die vollständige DataCamp-Bibliothek mit zentralisierten Berichten, Zuweisungen, Projekten und mehrIn den folgenden Tracks
Machine Learning Scientist mit Python
Gehe zu TrackMitwirkende
Voraussetzungen
Supervised Learning with scikit-learnRobert O'Callaghan
Mehr AnzeigenDirector of Data Science, Ordergroove
Was sagen andere Lernende?
Melden Sie sich an 15 Millionen Lernende und starten Sie Feature Engineering for Machine Learning in Python Heute!
Kostenloses Konto erstellen
oder
Durch Klick auf die Schaltfläche akzeptierst du unsere Nutzungsbedingungen, unsere Datenschutzrichtlinie und die Speicherung deiner Daten in den USA.