Topic Modeling in R

Learn how to fit topic models using the Latent Dirichlet Allocation algorithm.
Start Course for Free
4 Hours14 Videos49 Exercises4,243 Learners
3950 XP

Create Your Free Account

GoogleLinkedInFacebook
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies


Course Description

This course introduces students to the areas involved in topic modeling: preparation of corpus, fitting of topic models using Latent Dirichlet Allocation algorithm (in package topicmodels), and visualizing the results using ggplot2 and wordclouds.

  1. 1

    Quick introduction to the workflow

    Free
    This chapter introduces the workflow used in topic modeling: preparation of a document-term matrix, model fitting, and visualization of results with ggplot2.
    Play Chapter Now
  2. 2

    Wordclouds, stopwords, and control arguments

    This chapter explains how to use join functions to remove or keep words in the document-term matrix, how to make wordcloud charts, and how to use some of the many control arguments.
    Play Chapter Now
  3. 3

    Named entity recognition as unsupervised classification

    This chapter goes into detail on how LDA topic models can be used as classifiers. It covers the importance of the Dirichlet shape parameter alpha, construction of word contexts for named entities using regex, and technical issues like corpus alignment and held-out data.
    Play Chapter Now
  4. 4

    How many topics is enough?

    This chapter explains the basic methods used in the search for the optimal number of topics. It also covers how to use a single document as a source of data, and how topic numbering can be controlled using seed words.
    Play Chapter Now
In the following tracks
Machine Learning Scientist
Collaborators
Richie CottonHadrien Lacroix
Pavel Oleinikov Headshot

Pavel Oleinikov

Associate Director, Quantitative Analysis Center, Wesleyan University
Pavel Oleinikov uses his background in social and natural sciences to advance the application of quantitative methods to data from the social world. He teaches courses on basics of Big Data, network analysis, text mining, and skills-focused courses. A large part of his work lies in assisting Wesleyan faculty with their diverse projects.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA