Loved by learners at thousands of companies
Being able to understand, use, and summarize non-numerical data—such as a person’s blood type or marital status—is a vital component of being a data scientist. In this course, you’ll learn how to manipulate and visualize categorical data using pandas and seaborn. Through hands-on exercises, you’ll get to grips with pandas' categorical data type, including how to create, delete, and update categorical columns. You’ll also work with a wide range of datasets including the characteristics of adoptable dogs, Las Vegas trip reviews, and census data to develop your skills at working with categorical data.
Introduction to Categorical DataFree
Almost every dataset contains categorical information—and often it’s an unexplored goldmine of information. In this chapter, you’ll learn how pandas handles categorical columns using the data type category. You’ll also discover how to group data by categories to unearth great summary statistics.Course introduction50 xpCategorical vs. numerical100 xpExploring a target variable100 xpOrdinal categorical variables100 xpCategorical data in pandas50 xpSetting dtypes and saving memory100 xpCreating a categorical pandas Series100 xpSetting dtype when reading data100 xpGrouping data by category in pandas50 xpCreate lots of groups50 xpSetting up a .groupby() statement100 xpUsing pandas functions effectively100 xp
Categorical pandas Series
Now it’s time to learn how to set, add, and remove categories from a Series. You’ll also explore how to update, rename, collapse, and reorder categories, before applying your new skills to clean and access other data within your DataFrame.Setting category variables50 xpSetting categories100 xpAdding categories100 xpRemoving categories100 xpUpdating categories50 xpCollapsing categories knowledge check50 xpRenaming categories100 xpCollapsing categories100 xpReordering categories50 xpReordering categories in a Series100 xpUsing .groupby() after reordering100 xpCleaning and accessing data50 xpCleaning variables100 xpAccessing and filtering data100 xp
Visualizing Categorical Data
In this chapter, you’ll use the seaborn Python library to create informative visualizations using categorical data—including categorical plots (cat-plot), box plots, bar plots, point plots, and count plots. You’ll then learn how to visualize categorical columns and split data across categorical columns to visualize summary statistics of numerical columns.Introduction to categorical plots using Seaborn50 xpBoxplot understanding50 xpCreating a box plot100 xpSeaborn bar plots50 xpCreating a bar plot100 xpOrdering categories100 xpBar plot using hue100 xpPoint and count plots50 xpCreating a point plot100 xpCreating a count plot100 xpReview catplot() types100 xpAdditional catplot() options50 xpOne visualization per group100 xpUpdating categorical plots100 xp
Pitfalls and Encoding
Lastly, you’ll learn how to overcome the common pitfalls of using categorical data. You’ll also grow your data encoding skills as you are introduced to label encoding and one-hot encoding—perfect for helping you prepare your data for use in machine learning algorithms.Categorical pitfalls50 xpMemory usage knowledge check50 xpOvercoming pitfalls: string issues100 xpOvercoming pitfalls: using numpy arrays100 xpLabel encoding50 xpCreate a label encoding and map100 xpUsing saved mappings100 xpCreating a Boolean encoding100 xpOne-hot encoding50 xpOne-hot knowledge check50 xpOne-hot encoding specific columns100 xpWrap-up video50 xp
PrerequisitesData Manipulation with pandas
Research Data Scientist
Kasey Jones is a research data scientist at RTI International. His work focuses primarily on agent-based model simulations and natural language processing analysis. He also enjoys creating unique visualizations using D3, and building R-Shiny and python Dash dashboards. Outside of RTI he spends his time working through leet code problems, playing chess, and traveling all over the world.