Skip to main content
Paul Love avatar

Paul Love has completed

Topic Modeling in R

Start course For Free
4 hours
3,950 XP
Statement of Accomplishment Badge

Loved by learners at thousands of companies

Course Description

This course introduces students to the areas involved in topic modeling: preparation of corpus, fitting of topic models using Latent Dirichlet Allocation algorithm (in package topicmodels), and visualizing the results using ggplot2 and wordclouds.
For Business

GroupTraining 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more
Try DataCamp for BusinessFor a bespoke solution book a demo.
  1. 1

    Quick introduction to the workflow


    This chapter introduces the workflow used in topic modeling: preparation of a document-term matrix, model fitting, and visualization of results with ggplot2.

    Play Chapter Now
    Why learn topic modeling
    50 xp
    Topics as word contexts
    50 xp
    Topic prevalence
    50 xp
    Probabilities of words belonging to topics
    100 xp
    Counting words
    50 xp
    Removal of punctuation marks
    50 xp
    Word frequencies
    100 xp
    Our first LDA model
    100 xp
    Displaying frequencies with ggplot
    50 xp
    Simple LDA model
    100 xp
  2. 3

    Named entity recognition as unsupervised classification

    This chapter goes into detail on how LDA topic models can be used as classifiers. It covers the importance of the Dirichlet shape parameter alpha, construction of word contexts for named entities using regex, and technical issues like corpus alignment and held-out data.

    Play Chapter Now


History DataDocument Corpus


Collaborator's avatar
Hadrien Lacroix
Collaborator's avatar
Richie Cotton
Pavel Oleinikov HeadshotPavel Oleinikov

Associate Director, Quantitative Analysis Center, Wesleyan University

Pavel Oleinikov uses his background in social and natural sciences to advance the application of quantitative methods to data from the social world. He teaches courses on basics of Big Data, network analysis, text mining, and skills-focused courses. A large part of his work lies in assisting Wesleyan faculty with their diverse projects.
See More

Join over 13 million learners and start Topic Modeling in R today!

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.