Skip to content
Intermediate Python
Intermediate Python
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv")Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
gapminder.head()brics.head()Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let's change this. Wouldn't it be nice if the size of the dots corresponds to the population?
To accomplish this, there is a list pop loaded in your workspace. It contains population numbers for each country expressed in millions. You can see that this list is added to the scatter method, as the argument s, for size.
# The variable np_pop is not defined in the previous cells you provided.
# If you want to create np_pop as a NumPy array of population values (in millions) from the gapminder DataFrame, you can define it as follows:
np_pop = np.array(gapminder['population']) / 1_000_000 # population in millions
# Now, np_pop is defined and can be used in your scatter plot as the 's' argument for size.
np_popplt.scatter(gapminder["gdp_cap"], gapminder["life_exp"],s=np_pop)Run cancelled
fruits = ['apple', 'banana', 'cherry']
x = fruits.index("cherry")
xRun cancelled
- In dataframes the single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.
- loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.
- for boolean operations in numpy array we need the logical_and etc not just and.
- when I have a 2D array and I wanna go through all the elements I use np.nditer.
- If a Pandas DataFrame were to function the same way as a 2D NumPy array, then maybe a basic for loop like this, to print out each row, could work. Let's see what the output is. Well, this was rather unexpected. We simply got the column names. Also interesting, but not exactly what we want. In Pandas, you have to mention explicitly that you want to iterate over the rows. You do this by calling the iterrows method on the brics country, thus specifying another "sequence": The iterrows method looks at the DataFrame, and on each iteration generates two pieces of data: the label of the row and then the actual data in the row as a Pandas Series.
- new things: df.iterrows() and df['new_column']=df['some_column'].apply(len)
- The row data that's generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets:
for lab, row in brics.iterrows() : print(row['country'])
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Create a loop that iterates through the
bricsDataFrame and prints "The population of {country} is {population} million!". - Create a histogram of the life expectancies for countries in Africa in the
gapminderDataFrame. Make sure your plot has a title, axis labels, and has an appropriate number of bins. - Simulate 10 rolls of two six-sided dice. If the two dice add up to 7 or 11, print "A win!". If the two dice add up to 2, 3, or 12, print "A loss!". If the two dice add up to any other number, print "Roll again!".