Skip to content

Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.


1 hidden cell

Pandas is the world's most popular Python library, used for everything from data manipulation to data analysis. In this course, you'll learn how to manipulate DataFrames, as you extract, filter, and transform real-world datasets for analysis. Using pandas you’ll explore all the core data science concepts. Using real-world data, including Walmart sales figures and global temperature time series, you’ll learn how to import, clean, calculate statistics, and create visualizations—using pandas to add to the power of Python!

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
  • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
  • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
  • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.

Transforming DataFrames

# Print the head of the homelessness data
print(homelessness.head())
print("\n","\n") #PARA HACER SALTO DE LINEA

# Print information about homelessness
print(homelessness.info())
print("\n","\n")
# Print the shape of homelessness

print(homelessness.shape)
print("\n","\n")
# Print a description of homelessness
print(homelessness.describe())
print("\n","\n")
# Import pandas using the alias pd
import pandas as pd

# Print the values of homelessness
print(homelessness.values)
print("\n","\n")

# Print the column index of homelessness
print(homelessness.columns)
print("\n","\n")

# Print the row index of homelessness
print(homelessness.index)
print("\n","\n")
Sorting and subsetting

# Sort homelessness by individuals
homelessness_ind = homelessness.sort_values("individuals", ascending=True)

# Print the top few rows
print(homelessness_ind.head())
print("\n")

# Sort homelessness by descending family members
homelessness_fam = homelessness.sort_values("family_members",ascending = False)

# Print the top few rows
print(homelessness_fam.head())
print("\n")

# Sort homelessness by region, then descending family members
homelessness_reg_fam = homelessness.sort_values(["region","family_members"],[ ascending = True,ascending = False])

# Print the top few rows
print(homelessness_fam.head())
print("\n")

# Sort homelessness by region, then descending family members
homelessness_reg_fam = homelessness.sort_values(["region", "family_members"],ascending =[True, False])

# Print the top few rows
print(homelessness_reg_fam.head())
Subsetting columns