Premium Project

Text Mining America's Toughest Game Show

Use text mining to analyze Jeopardy! data.

Start Project
  • 10 tasks
  • 549 participants
  • 1,500 XP

Project Description

Note: this project is soft launched, which means you may experience bugs. Please click "Report an Issue" in the top-right corner of the project interface to provide feedback.

Jeopardy! (hosted by Alex Trebek) has cemented itself in TV history as one of the most iconic American game shows of all time. In this project, you will examine ten years worth of Jeopardy! episodes with text mining techniques to find the most frequently asked types of questions on the show.

You will be applying skills from Text Mining Bag of Words, as well as using basic tidyverse functions for data wrangling. However, in-depth knowledge of the tidyverse suite is not necessary.

The dataset used in this project is a cleaned subset of this dataset from the Datasets subreddit, uploaded by user trexmatt.

Project Tasks

  • 1This... is... Jeopardy!
  • 2A glimpse ahead
  • 3Corpus of categories
  • 4Cleaning the categories
  • 5Favorite topics
  • 6Removing unwanted words
  • 7Creating better tools, part 1
  • 8Creating better tools, part 2
  • 9Think!
  • 10A few insights
Alexis Lee

Intern at DataCamp

Alexis was an intern at DataCamp during her time as a student at Yale University.

See More


  • R LogoR
  • Topics

    Case Studies