Skip to main content

Course

Categorical Data in the Tidyverse

BasicSkill Level

4.7+

Updated 01/2026

Get ready to categorize! In this course, you will work with non-numerical data, such as job titles or survey responses, using the Tidyverse landscape.

Start Course for Free

RData Manipulation

4 hr

13 videos

44 Exercises

3,600 XP

16,552

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

As a data scientist, you will often find yourself working with non-numerical data, such as job titles, survey responses, or demographic information. R has a special way of representing them, called factors, and this course will help you master working with them using the tidyverse package forcats. We’ll also work with other tidyverse packages, including ggplot2, dplyr, stringr, and tidyr and use real world datasets, such as the fivethirtyeight flight dataset and Kaggle’s State of Data Science and ML Survey. Following this course, you’ll be able to identify and manipulate factor variables, quickly and efficiently visualize your data, and effectively communicate your results. Get ready to categorize!

Prerequisites

Reshaping Data with tidyr

1

Introduction to Factor Variables

In this chapter, you’ll learn all about factors. You’ll discover the difference between categorical and ordinal variables, how R represents them, and how to inspect them to find the number and names of the levels. Finally, you’ll find how forcats, a tidyverse package, can improve your plots by letting you quickly reorder variables by their frequency.

Introduction to qualitative variables

Recognizing factor variables

Qualitative variables in theory

Understanding your qualitative variables

Getting number of levels

Examining number of levels

Examining levels

Making better plots

Reordering a variable by its frequency

Ordering one variable by another

2

Manipulating Factor Variables

You’ll continue to dive into the forcats package, learning how to change the order and names of levels and even collapse them into one another.

Reordering factors

Changing the order of factor levels

Tricks of fct_relevel()

Renaming factor levels

Distinguishing between forcats functions

Renaming a few levels

When you have a typo

Collapsing factor levels

Manually collapsing levels

Lumping variables by proportion

Preserving the most common levels

3

Creating Factor Variables

Having gotten a good grasp of forcats, you’ll expand out to the rest of the tidyverse, learning and reviewing functions from dplyr, tidyr, and stringr. You’ll refine graphs with ggplot2 by changing axes to percentage scales, editing the layout of the text, and more.

Examining common themed variables

Grouping and reshaping similar columns

Summarizing data

Creating an initial plot

Tricks of ggplot2

Editing plot text

Reordering graphs

Changing and creating variables with case_when()

case_when() with single variable

case_when() from multiple columns

4

Case Study on Flight Etiquette

In this final chapter, you’ll take all that you’ve learned and apply it in a case study. You’ll learn more about working with strings and summarizing data, then replicate a publication quality 538 plot.

Case study introduction

Changing characters to factors

Tidying data

Data preparation and regex

Cleaning up strings

Dichotomizing variables

Summarizing data

Recreating the plot

Creating an initial plot

Fixing labels

Flipping things around

Finalizing the chart

End of course recap

Categorical Data in the Tidyverse

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.7

from 166 reviews

82%

16%

2%

0%

0%

Sort by

Luka

2 hours ago

Sara Keren

2 days ago

Jason

5 days ago

Marcin

5 days ago

Alicya Novita

7 days ago

Avinashkumar

2 weeks ago

A well structured course that provides practical experience working with categorical data in the Tidyverse. The exercises and case studies helped reinforce key concepts like factors, case_when(), and data visualization with ggplot2. Some topics moved quickly, but overall it was a valuable learning experience.

Luka

Sara Keren

Jason

FAQs

What is the forcats package and why does this course focus on it?

Forcats is a tidyverse package designed for working with factor variables in R. This course teaches you to use it for reordering, lumping, and recoding categorical data efficiently.

What datasets are used in this course?

You will work with the FiveThirtyEight flight dataset and Kaggle's State of Data Science and ML Survey to practice manipulating and visualizing categorical data.

Which other tidyverse packages besides forcats will I use?

You will also use ggplot2 for visualization, dplyr for data manipulation, stringr for string operations, and tidyr for reshaping data alongside forcats.

What should I already know before starting this course?

You should have completed Introduction to the Tidyverse, Data Manipulation with dplyr, and Reshaping Data with tidyr to be prepared for this course.

Will I learn to visualize categorical data effectively?

Yes. The course covers how to create clear visualizations of factor variables, including reordering them by frequency or other metrics to make plots more readable.

Join over 19 million learners and start Categorical Data in the Tidyverse today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.