Skip to content
Data Manipulation with pandas
  • AI Chat
  • Code
  • Report
  • Spinner

    Data Manipulation with pandas

    Run the hidden code cell below to import the data used in this course.

    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Import the four datasets
    avocado = pd.read_csv("datasets/avocado.csv")
    homelessness = pd.read_csv("datasets/homelessness.csv")
    temperatures = pd.read_csv("datasets/temperatures.csv")
    walmart = pd.read_csv("datasets/walmart.csv")

    Take Notes

    Add notes about the concepts you've learned and code cells with code you want to keep.

    SORTING IN PANDAS DF

    sorting by ascending SORTING FOR MULTIPLE VARIABLES SORTING FOR MULTIPLE VARIABLES WITH EACH HAVING THEIR OWN ORDER OF SORTING

    SUBSETTING

    SUBSETTING MULTIPLE COLUMNS

    SUBSETTING ROWS

    SUBSETTING ROWS EXAMPLES

    ADD NEW COLUMN

    Grouped Summary Statistics

    By default pivot table takes the mean value for each group

    fill_value replaces missing values with a real value (known as imputation). What to replace missing values with is a topic big enough to have its own course (Dealing with Missing Data in Python), but the simplest thing to do is to substitute a dummy value. margins is a shortcut for when you pivoted by two variables, but also wanted to pivot by each of those variables separately: it gives the row and column totals of the pivot table contents.

    How to extract certain value from panda series

    Visualising Data

    Missing values

    To find out any missing value there in the column, do following: To remove rows having missing values, do following:

    To fill rows having missing values with a value, do following:-

    Creating dataframes