Home RIntroduction to Natural Language Processing in R

Introduction to Natural Language Processing in R

Gain an overview of all the skills and tools needed to excel in Natural Language Processing in R.

Start Course for Free

4 Hours15 Videos47 Exercises

7,191 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

As with any fundamentals course, Introduction to Natural Language Processing in R is designed to equip you with the necessary tools to begin your adventures in analyzing text. Natural language processing (NLP) is a constantly growing field in data science, with some very exciting advancements over the last decade. This course will cover the basics of these topics and prepare you for expanding your analysis capabilities. We dive into regular expressions, topic modeling, named entity recognition, and others, all while providing thorough examples that can be used to kick start your future analysis.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

1
True Fundamentals
Free
Chapter 1 of Introduction to Natural Langauge Processing prepares you for running your first analysis on text. You will explore regular expressions and tokenization, two of the most common components of most analysis tasks. With regular expressions, you can search for any pattern you can think of, and with tokenization, you can prepare and clean text for more sophisticated analysis. This chapter is necessary for tackling the techniques we will learn in the remaining chapters of this course.
Play Chapter Now
Regular expression basics
50 xp
Practicing syntax with grep
100 xp
Exploring regular expression functions.
100 xp
Tokenization
50 xp
tidytext functions
50 xp
Tokenization: sentences
100 xp
Text cleaning basics
50 xp
Text preprocessing: remove stop words
100 xp
Text preprocessing: Stemming
100 xp
2
Representations of Text
In this chapter, you will learn the most common and studied ways to analyze text. You will look at creating a text corpus, expanding a bag-of-words representation into a TFIDF matrix, and use cosine-similarity metrics to determine how similar two pieces of text are to each other. You build on your foundations for practicing NLP before you dive into applications of NLP in chapters 3 and 4.
Play Chapter Now
Understanding an R corpus
50 xp
Explore an R corpus
100 xp
Creating a tibble from a corpus
100 xp
Creating a corpus
100 xp
The bag-of-words representation
50 xp
Practice BoW
50 xp
BoW Example
100 xp
Sparse matrices
100 xp
The TFIDF
50 xp
Manual calculations
50 xp
TFIDF Practice
100 xp
Cosine Similarity
50 xp
An example of failing at text analysis
100 xp
Cosine similarity example
100 xp
3
Applications: Classification and Topic Modeling
Chapter 3 focuses on two common text analysis approaches, classification modeling, and topic modeling. If you are working on text analysis projects, you will inevitably use one or both of these methods. This chapter teaches you how to perform both techniques and provides insight into how to approach these techniques from a practical point of you.
Play Chapter Now
Preparing text for modeling
50 xp
Data preparation
100 xp
Removing sparse terms
100 xp
Classification modeling
50 xp
Classification modeling example
100 xp
Confusion matrices
100 xp
TFIDF tibble vs dtm
50 xp
Introduction to topic modeling
50 xp
LDA practice
100 xp
Assigning topics to documents
100 xp
LDA in practice
50 xp
Testing perplexity
100 xp
Reviewing LDA results
100 xp
4
Advanced Techniques
In chapter 4 we cover two staples of natural language processing, sentiment analysis, and word embeddings. These are two analysis techniques that are a must for anyone learning the fundamentals of text analysis. Furthermore, you will briefly learn about BERT, part-of-speech tagging, and named entity recognition. Almost 15 different analysis techniques were covered in this course, so chapter 4 ends by recapping all of the great techniques you will learn about in this course.
Play Chapter Now
Sentiment analysis
50 xp
tidytext lexicons
100 xp
Sentiment scores
100 xp
Sentiment and emotion
100 xp
Word embeddings
50 xp
h2o practice
100 xp
word2vec
100 xp
Additional NLP analysis
50 xp
Reviewing methods #1
100 xp
Review methods #2
100 xp
Conclusion
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

Datasets

Animal Farm Russian Troll tweets

Collaborators

Mona Khalil

Chester Ismay

Adel Nehme

Prerequisites

Intermediate R Introduction to the Tidyverse

Kasey Jones

Research Data Scientist

What do other learners have to say?

Join over 13 million learners and start Introduction to Natural Language Processing in R today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Course Description

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

True Fundamentals

Representations of Text

Applications: Classification and Topic Modeling

Advanced Techniques

GroupTraining 2 or more people?

What do other learners have to say?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Introduction to Natural Language Processing in R today!

Create Your Free Account

Training 2 or more people?

Training 2 or more people?

Join over 13 million learners and start Introduction to Natural Language Processing in R today!