Skip to content

Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

SORTING IN PANDAS DF

sorting by ascending SORTING FOR MULTIPLE VARIABLES SORTING FOR MULTIPLE VARIABLES WITH EACH HAVING THEIR OWN ORDER OF SORTING

SUBSETTING

SUBSETTING MULTIPLE COLUMNS

SUBSETTING ROWS

SUBSETTING ROWS EXAMPLES

ADD NEW COLUMN

Grouped Summary Statistics

By default pivot table takes the mean value for each group

fill_value replaces missing values with a real value (known as imputation). What to replace missing values with is a topic big enough to have its own course (Dealing with Missing Data in Python), but the simplest thing to do is to substitute a dummy value. margins is a shortcut for when you pivoted by two variables, but also wanted to pivot by each of those variables separately: it gives the row and column totals of the pivot table contents.

How to extract certain value from panda series

Visualising Data

Missing values

To find out any missing value there in the column, do following: To remove rows having missing values, do following:

To fill rows having missing values with a value, do following:-

Creating dataframes