Skip to main content

Working with Categorical Data in Python

Learn how to manipulate and visualize categorical data using pandas and seaborn.

Start Course for Free
4 Hours15 Videos52 Exercises2,719 Learners
4200 XP

Create Your Free Account



By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

Being able to understand, use, and summarize non-numerical data—such as a person’s blood type or marital status—is a vital component of being a data scientist. In this course, you’ll learn how to manipulate and visualize categorical data using pandas and seaborn. Through hands-on exercises, you’ll get to grips with pandas' categorical data type, including how to create, delete, and update categorical columns. You’ll also work with a wide range of datasets including the characteristics of adoptable dogs, Las Vegas trip reviews, and census data to develop your skills at working with categorical data.

  1. 1

    Introduction to Categorical Data


    Almost every dataset contains categorical information—and often it’s an unexplored goldmine of information. In this chapter, you’ll learn how pandas handles categorical columns using the data type category. You’ll also discover how to group data by categories to unearth great summary statistics.

    Play Chapter Now
    Course introduction
    50 xp
    Categorical vs. numerical
    100 xp
    Exploring a target variable
    100 xp
    Ordinal categorical variables
    100 xp
    Categorical data in pandas
    50 xp
    Setting dtypes and saving memory
    100 xp
    Creating a categorical pandas Series
    100 xp
    Setting dtype when reading data
    100 xp
    Grouping data by category in pandas
    50 xp
    Create lots of groups
    50 xp
    Setting up a .groupby() statement
    100 xp
    Using pandas functions effectively
    100 xp
  2. 2

    Categorical pandas Series

    Now it’s time to learn how to set, add, and remove categories from a Series. You’ll also explore how to update, rename, collapse, and reorder categories, before applying your new skills to clean and access other data within your DataFrame.

    Play Chapter Now
  3. 3

    Visualizing Categorical Data

    In this chapter, you’ll use the seaborn Python library to create informative visualizations using categorical data—including categorical plots (cat-plot), box plots, bar plots, point plots, and count plots. You’ll then learn how to visualize categorical columns and split data across categorical columns to visualize summary statistics of numerical columns.

    Play Chapter Now
  4. 4

    Pitfalls and Encoding

    Lastly, you’ll learn how to overcome the common pitfalls of using categorical data. You’ll also grow your data encoding skills as you are introduced to label encoding and one-hot encoding—perfect for helping you prepare your data for use in machine learning algorithms.

    Play Chapter Now

In the following tracks

Data Analyst


Amy PetersonJustin Saddlemyer
Kasey Jones Headshot

Kasey Jones

Research Data Scientist

Kasey Jones is a research data scientist at RTI International. His work focuses primarily on agent-based model simulations and natural language processing analysis. He also enjoys creating unique visualizations using D3, and building R-Shiny and python Dash dashboards. Outside of RTI he spends his time working through leet code problems, playing chess, and traveling all over the world.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA