Skip to main content
HomePython

Course

Feature Engineering for NLP in Python

AdvancedSkill Level
4.8+
134 reviews
Updated 11/2024
Learn techniques to extract useful information from text and process them into a format suitable for machine learning.
Start Course for Free
PythonMachine Learning4 hr15 videos52 Exercises4,200 XP28,952Statement of Accomplishment

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Loved by learners at thousands of companies

Group

Training 2 or more people?

Try DataCamp for Business

Course Description

In this course, you will learn techniques that will allow you to extract useful information from text and process them into a format suitable for applying ML models. More specifically, you will learn about POS tagging, named entity recognition, readability scores, the n-gram and tf-idf models, and how to implement them using scikit-learn and spaCy. You will also learn to compute how similar two documents are to each other. In the process, you will predict the sentiment of movie reviews and build movie and Ted Talk recommenders. Following the course, you will be able to engineer critical features out of any text and solve some of the most challenging problems in data science!

Prerequisites

Introduction to Natural Language Processing in PythonSupervised Learning with scikit-learn
1

Basic features and readability scores

Learn to compute basic features such as number of words, number of characters, average word length and number of special characters (such as Twitter hashtags and mentions). You will also learn to compute readability scores and determine the amount of education required to comprehend a piece of text.
Start Chapter
2

Text preprocessing, POS tagging and NER

In this chapter, you will learn about tokenization and lemmatization. You will then learn how to perform text cleaning, part-of-speech tagging, and named entity recognition using the spaCy library. Upon mastering these concepts, you will proceed to make the Gettysburg address machine-friendly, analyze noun usage in fake news, and identify people mentioned in a TechCrunch article.
Start Chapter
3

N-Gram models

4

TF-IDF and similarity scores

Feature Engineering for NLP in Python
Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review
Enroll Now

Don’t just take our word for it

*4.8
from 134 reviews
87%
13%
1%
0%
0%
  • Egor
    2 days ago

  • Jerry
    5 days ago

  • Zeyad
    5 days ago

  • Marwan
    5 days ago

  • Ian
    last week

  • OSMAN
    last week

Egor

Jerry

Zeyad

FAQs

What NLP feature engineering techniques will I learn?

You learn POS tagging, named entity recognition, readability scores, n-gram models, tf-idf weighting, cosine similarity, and word embeddings. Each technique converts text into features suitable for ML models.

What projects or applications are built during the course?

You predict movie review sentiment, build a movie recommender, and create a TED Talk recommender. You also analyze noun usage in fake news and compare Pink Floyd songs using word vectors.

Which Python libraries are used for the feature engineering tasks?

You use scikit-learn for vectorization and modeling, and spaCy for POS tagging, named entity recognition, and text preprocessing. Both are industry-standard NLP tools.

What are readability scores and why do they matter?

Readability scores measure how much education is needed to understand a text. Chapter 1 teaches you to compute them as basic features that quantify text complexity for machine learning.

What prior NLP knowledge is needed before starting?

You need Introduction to Natural Language Processing in Python plus seven other prerequisites including pandas, scikit-learn, and statistics. This is an advanced course for experienced Python users.

Join over 19 million learners and start Feature Engineering for NLP in Python today!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.