Skip to content
Course Notes: Working with Categorical Data in Python
  • AI Chat
  • Code
  • Report
  • Course Notes

    Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! For courses that use data, the datasets will be available in the datasets folder.

    # Import any packages you want to use here
    import pandas as pd

    Problem

    You have been given a dataset containing information about students in a school. The dataset includes the following columns:

    • 'Name': Name of the student (string)
    • 'Age': Age of the student (numerical)
    • 'Gender': Gender of the student (categorical: 'Male' or 'Female')
    • 'Grade': Grade level of the student (ordinal categorical: 'A', 'B', 'C', 'D', or 'F')
    • 'Subject': Subject of study (categorical: 'Math', 'Science', 'English', 'History', or 'Art')
    • 'Score': Score obtained by the student in the subject (numerical)

    You are required to perform the following tasks:

    1. Explore the target variable 'Grade' and analyze its distribution in the dataset.
    2. Convert the 'Gender' column to a categorical data type using pandas.
    3. Convert the 'Grade' column to an ordinal categorical data type using pandas.
    4. Group the data by 'Subject' and calculate the average score for each subject.
    5. Group the data by 'Grade' and 'Subject' and calculate the maximum score for each grade-subject combination.

    Feel free to use pandas and any other necessary libraries to solve this problem. Good luck!

    # Import the csv file
    df = pd.read_csv('student.csv')