Skip to content

## Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.

```
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")
```

### Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

*Add your notes here*

```
# Import matplotlib.pyplot with alias plt
import matplotlib.pyplot as plt
# Look at the first few rows of data
print(avocado.head())
# Get the total number of avocados sold of each size
nb_sold_by_size = avocado[["nb_sold","size"]]
# Create a bar plot of the number of avocados sold by size
nb_sold_by_size.plot(kind = "bar")
# Show the plot
plt.show()
```

`# Add your code snippets here`

### Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

- Print the highest weekly sales for each
`department`

in the`walmart`

DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video. - What was the total
`nb_sold`

of organic avocados in 2017 in the`avocado`

DataFrame? If you're stuck, try reviewing this video. - Create a bar plot of the total number of homeless people by region in the
`homelessness`

DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video. - Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.