Skip to content

Intermediate Python

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Pandas = high level data manipulation tool based on NumPy that uses DataFrames

Series = 1D array that can be labeled and can be combined into DataFrames

# import pandas as pd

# data_frame = pd.DataFrame(pd.read('xxx.csv', index_col = 0))

print(brics)

### LOC (label based access in pd)
# Row access (look up by index or another row)
# brics.loc[["RU", "CH"]]

# Column access (look up by columns)
# brics.loc[:, 'country', 'capital']

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Create a loop that iterates through the brics DataFrame and prints "The population of {country} is {population} million!".
  • Create a histogram of the life expectancies for countries in Africa in the gapminder DataFrame. Make sure your plot has a title, axis labels, and has an appropriate number of bins.
  • Simulate 10 rolls of two six-sided dice. If the two dice add up to 7 or 11, print "A win!". If the two dice add up to 2, 3, or 12, print "A loss!". If the two dice add up to any other number, print "Roll again!".
### COMPARISON OF CREATING A PANDA SERIES TO A PANDA DATAFRAME
## SERIES
print(brics["country"])

## DATAFRAME
print(brics[["country"]])

loc and iloc also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands in the IPython Shell. Again, paired commands produce the same result.

cars.loc['IN', 'cars_per_cap'] cars.iloc[3, 0]

cars.loc[['IN', 'RU'], 'cars_per_cap'] cars.iloc[[3, 4], 0]

cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']] cars.iloc[[3, 4], [0, 1]]

It's also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma:

cars.loc[:, 'country'] cars.iloc[:, 1]

cars.loc[:, ['country','drives_right']] cars.iloc[:, [1, 2]]