Data Manipulation with pandas
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
SORTING IN PANDAS DF
sorting by ascending SORTING FOR MULTIPLE VARIABLES SORTING FOR MULTIPLE VARIABLES WITH EACH HAVING THEIR OWN ORDER OF SORTING
SUBSETTING
SUBSETTING MULTIPLE COLUMNS
SUBSETTING ROWS
SUBSETTING ROWS EXAMPLES
ADD NEW COLUMN
Grouped Summary Statistics
By default pivot table takes the mean value for each group
fill_value replaces missing values with a real value (known as imputation). What to replace missing values with is a topic big enough to have its own course (Dealing with Missing Data in Python), but the simplest thing to do is to substitute a dummy value. margins is a shortcut for when you pivoted by two variables, but also wanted to pivot by each of those variables separately: it gives the row and column totals of the pivot table contents.
How to extract certain value from panda series
Visualising Data
Missing values
To find out any missing value there in the column, do following: To remove rows having missing values, do following:
To fill rows having missing values with a value, do following:-
Creating dataframes