Skip to content

## Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.

```
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")
```

### Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

```
#DATA MANIPULATION WITH PANDAS
"Introduction to pandas"
#Exploring a DataFrame
#Chapter 1: DataFrames
"Sorting and subsetting"
"Creating new columns"
#Chapter 2: Aggregating Data
'Summary statistics'
'Counting'
'Grouped summary statistics'
#Chapter 3: Slicing and Indexing Data
'Subsetting using slicing'
'Indexes and subsetting using indexes'
#Chapter 4: Creating and Visualizing Data
'Plotting'
'Handling missing data'
'Reading data into a DataFrame'
#Exploring a DataFrame:
dogs.head()
# the first few rows (the “head” of the DataFrame)
dogs.info()
# shows information on each of the columns, such as the data type and number of missing values
dogs.shape
# returns the number of rows and columns of the DataFrame
dogs.describe()
# calculates a few summary statistics for each column
dogs.values
# A two-dimensional NumPy array of values.
dogs.columns
# An index of columns: the column names.
dogs.index
# An index for the rows: either row numbers or row names.
#Chapter 1: DataFrames
'Sorting and subsetting'
dogs.sort_values("weight_kg")
#Sort the lightest dog at the top
dogs.sort_values("weight_kg", ascending=False)
#Sort the heaviest dog at the top
dogs.sort_values(["weight_kg","height_cm"])
#Sort the lightest dog at the top Then the shortest dog
dogs.sort_values(["weight_kg","height_cm"],ascending=[True,False])
#Sort the lightest dog at the top Then the tallest dog
"Subsetting Coulmns"
dogs["name"]
#Subsetting DataFrame[“Coulmn_name”]
dogs[["name","weight_kg"]]
#Subsetting multiple columns
"Subsetting Rows"
Dogs["height_cm"] > 50
#Get True or false values
dogs[dogs["height_cm"] > 50]
#Get The rows of dogs taller than 50
dogs[dogs["breed"] == "labrador"]
#Subsetting based on text data
dogs[dogs["date_of_birth"] < "2015-01-01"]
#Subsetting based on dates
is_lab = dogs["breed"] == "Labrador"
is_brown= dogs["color"] == "Brown"
dogs[is_lab & is_brown]
#Subsetting based on multiple conditions
Is_balck_or_brown = dogs["color"].isin(["Black","Brown"])
dogs[is_balck_or_brown]
#Subsetting based on multiple conditions using .isin()
"Creating a new column"
dogs["height_m"]=dogs["height_cm"]/100
#Adding a new column
```

Hidden output

*Add your notes here*

`# Add your code snippets here`

```
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")
# Add total col as sum of individuals and family_members
homelessness["total"] = homelessness["individuals"]+homelessness["family_members"]
# Add p_individuals col as proportion of total that are individuals
homelessness["p_individuals"]=homelessness["individuals"]/homelessness["total"]
# See the result
print(homelessness)
```

### Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

- Print the highest weekly sales for each
`department`

in the`walmart`

DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video. - What was the total
`nb_sold`

of organic avocados in 2017 in the`avocado`

DataFrame? If you're stuck, try reviewing this video. - Create a bar plot of the total number of homeless people by region in the
`homelessness`

DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video. - Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.