Premium Project

Going Down to South Park: A Text Analysis

Analyze the dialog and IMDB ratings of 287 South Park episodes. Warning: contains explicit language.

Start Project
  • 9 tasks
  • 210 participants
  • 1,500 XP

Project Description

Warning: the dataset in this project contains explicit language.

South Park is a satiric American TV show that is popular around the world. In this Project, you will combine two datasets: dialogs from the first 21 seasons (287 episodes) and IMDB ratings of these episodes. Using some text analysis principles, you will answer questions like: Are naughtier episodes more popular? Is Eric Cartman the naughtiest character in the show?

You will apply skills from Introduction to the Tidyverse or Sentiment Analysis in R: The Tidy Way. You will be answering the questions using ggplot2 so Data Visualization with ggplot2: Part 2 might come in handy too.

Note: this project is soft launched, which means you may experience bugs. Please click "Report an Issue" in the top-right corner of the screen to provide feedback.

Project Tasks

  • 1Import and explore data
  • 2Sentiments, swear words, and stemming
  • 3Summarize data by episode
  • 4South Park overall sentiment
  • 5South Park episode popularity
  • 6Are naughty episodes more popular?
  • 7Comparing profanity of two characters
  • 8Is Eric Cartman the naughtiest character?
  • 9Let's answer some questions
Instructor Avatar
Patrik Drhlík

Freelance Data Scientist

Patrik is a freelance data scientist that helps small local companies with data-related problems. He is also pursuing his Ph.D. where he specializes in missing data. He never leaves home without his Rubik's cube and loves hitchhiking, athletics, mountains, kangaroos, and beer (he is Czech).

See More

Technology

  • R LogoR
  • Topics

    Data ManipulationData VisualizationProbability & StatisticsImporting & Cleaning Data