课程

Introduction to Natural Language Processing in Python

中级技能水平

更新时间 2026年2月

Learn fundamental natural language processing techniques using Python and how to apply them to extract insights from real-world text data.

免费开始课程

PythonMachine Learning

4小时

15 视频

51 道练习

3,750 XP

140K+

成就证明

深受数千家公司学习者的喜爱

需要团队培训？

企业版试用

课程描述

In this course, you'll learn natural language processing (NLP) basics, such as how to identify and separate words, how to extract topics in a text, and how to build your own fake news classifier. You'll also learn how to use basic libraries such as NLTK, alongside libraries which utilize deep learning to solve common NLP problems. This course will give you the foundation to process and parse text as you move forward in your Python learning.

先决条件

1

Regular expressions & word tokenization

This chapter will introduce some basic NLP concepts, such as word tokenization and regular expressions to help parse text. You'll also learn how to handle non-English text and more difficult tokenization you might find.

Introduction to regular expressions

Which pattern?

Practicing regular expressions: re.split() and re.findall()

Introduction to tokenization

Word tokenization with NLTK

More regex with re.search()

Advanced tokenization with NLTK and regex

Choosing a tokenizer

Regex with NLTK tokenization

Non-ascii tokenization

Charting word length with NLTK

Charting practice

2

Simple topic identification

This chapter will introduce you to topic identification, which you can apply to any text you encounter in the wild. Using basic NLP models, you will identify topics from texts based on term frequencies. You'll experiment and compare two simple methods: bag-of-words and Tf-idf using NLTK, and a new library Gensim.

Word counts with bag-of-words

Bag-of-words picker

Building a Counter with bag-of-words

Simple text preprocessing

Text preprocessing steps

Text preprocessing practice

Introduction to gensim

What are word vectors?

Creating and querying a corpus with gensim

Gensim bag-of-words

Tf-idf with gensim

What is tf-idf?

Tf-idf with Wikipedia

3

Named-entity recognition

This chapter will introduce a slightly more advanced topic: named-entity recognition. You'll learn how to identify the who, what, and where of your texts using pre-trained models on English and non-English text. You'll also learn how to use some new libraries, polyglot and spaCy, to add to your NLP toolbox.

Named Entity Recognition

NER with NLTK

Charting practice

Stanford library with NLTK

Introduction to SpaCy

Comparing NLTK with spaCy NER

spaCy NER Categories

Multilingual NER with polyglot

French NER with polyglot I

French NER with polyglot II

Spanish NER with polyglot

4

Building a "fake news" classifier

You'll apply the basics of what you've learned along with some supervised machine learning to build a "fake news" detector. You'll begin by learning the basics of supervised machine learning, and then move forward by choosing a few important features and testing ideas to identify and classify fake news articles.

Classifying fake news using supervised learning with NLP

Which possible features?

Training and testing

Building word count vectors with scikit-learn

CountVectorizer for text classification

TfidfVectorizer for text classification

Inspecting the vectors

Training and testing a classification model with scikit-learn

Text classification models

Training and testing the "fake news" model with CountVectorizer

Training and testing the "fake news" model with TfidfVectorizer

Simple NLP, complex problems

Improving the model

Improving your model

Inspecting your model

Introduction to Natural Language Processing in Python

课程完成

获得成就证明

将此证书添加到您的 LinkedIn 档案、简历或履历中
在社交媒体和绩效评估中分享立即注册

加入超过19百万学习者，今天就开始Introduction to Natural Language Processing in Python！

通过 DataCamp for Mobile 提升您的数据技能

随时随地通过我们的移动课程和每日 5 分钟编程挑战提升技能。