Skip to content

Course Notes

Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! For courses that use data, the datasets will be available in the datasets folder.

# Import any packages you want to use here
import pandas as pd

Take Notes

Definition Categorical data in python:

  • Categorical data is type of data that consists of categories or groups this type of data is non-numeric.
  • Often represented by words or symbols.

The type of categorical data:

  • Categorical data can divided into nominal and ordinal data.
  • Nominal data refers to data that cannot be ranked or ordered sush as eye color or gender.
  • Ordinal data is data that can be ranked or ordered such as academic grades or levels of satisfaction.
  • Nominal data often use in surveys or when collecting demographic information
  • Ordinal data used in rating scales or when measuuring attitudes or opinions

Add notes here about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Read your data in csv file
adult = pd.read_csv('datasets/adult.csv')
adult
# Get summarize of column Above/Below/50k
print(adult["Above/Below 50k"].describe())

observe here we get different result different values when we use normalize equal True.

When put normalize to True the output included relative ferquency value instead of counts of unique values

# print frequency table of Above/Below?50k
print(adult["Above/Below 50k"].value_counts())
# print relative frequency values here we put argument normalize equal True

print(adult["Above/Below 50k"].value_counts(normalize=True))

What the ddifferent between dtype and dtypes

dtype use with Seris
dtypes use with DataFrame this will be obvious when we applying.
adult = pd.read_csv('datasets/adult.csv')
adult.head(3)
# Use dtypes
adult.dtypes
# Use dtype ('O') means object
adult["Marital Status"].dtype
# First we conver this column marital Status to categories data
adult["Marital Status"] = adult["Marital Status"].astype("category")
adult["Marital Status"].dtype

How to create categorical Series?

There is two ways:

  • Use pandas dot Series
  • or pandas dot categorical
  • With parameter Categorical we use key categories=[] list and key orderedequal True