Intermediate Python
Run the hidden code cell below to import the data used in this course.
1 hidden cell
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Square Brackets (1)
In the video, you saw that you can do index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets.
In the sample code, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:
cars['cars_per_cap']
cars[['cars_per_cap']]
The single bracket version gives Pandas Series, the double bracket version gives a Pandas DataFrame.
Square Backets (2)
Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the 1st 5 rows from the cars DataFrame:
cars[0:5]
The result is another DataFrame containing only the rows you specified.
Pay attention: you can only select rows using sqaure brackets if you specify the slice, like 0:4. Also, you're using the integer indexes of the rows here, not the row labels!
loc and iloc (1)
With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.
Try out the following commands in the IPython Shell to experiment with loc and iloc to select observations. Each pair of commands here gives the same result.
cars.loc['RU']
cars.iloc['4']
cars.loc[['RU']]
cars.iloc[['4']]
cars.loc[['RU', 'AUS']]
cars.iloc[['4', '1']]
As before, code is included that imports the cars data as a Pandas DataFrame.
loc and iloc (2)
loc and iloc also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands in the IPython Shell. Again, paired commands produce the same result.
cars.loc['IN', 'cars_per_cap']
cars.iloc[3, 0]
cars.loc[['IN', 'RU'], 'cars_per_cap']
cars.iloc[[3 ,4], 0]
cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]
cars.iloc[[3, 4], [0, 1]]
Add your notes here
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out drives_right value of Morroco
print(cars.iloc[5, 2])
# Print sub-DataFrame
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])
loc and iloc (3)
It's also possible to select only with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma:
cars.loc[:, 'country']
cars.iloc[:, 1]
cars.loc[:, ['country', 'drives_right']]
cars.iloc[:, [1, 2]]
Equality
To check if 2 Python values, or variables, are equal you can use ==. To check if inequality, you need !=. As a refresher, have a look at the following that all result in True. Feel free to try them out in the IPython Shell.
2 == (1 + 1)
"intermediate" != "python"
True != False
"Python" != "python"
When you write these comparisons in a script, you will need to wrap a print() function around them to see the output.
# Comparison of booleans
print(True == False)
# Comparison of integers
print(-5 * 15 != 75)
# Comparison of strings
print("pyscript" == "PyScript")
# Compare a boolean with a numeric
print(True == 1)
Greater and less than
In the video, Hugo also talked about the less than and greater than signs, < and > in Python. You can combine them with an equal sign: <= and >=. Pay attention: <= is valid syntax, but =< is not. All Python expressions in the following code chunk evaluate to True:
3 < 4
3 <= 4
"alpha" <= "beta"
Remember that for string comparison, Python determines the relationship based on alphabetical order.
Compare arrays
Out of the box, you can also use comparison operators with NumPy arrays.
Remember areas, the list of areas measurements for different rooms in your house from Introduction to Python? This time there's 2 NumPy arrays: my_house and your_house. They both contain the areas for the kitchen, living room, bedroom and bathroom in the same order, so you can compare them.
and, or, not (1)
A boolean is either 1 or 0, True or False. With boolean operators such as and, or, and not, you can combine these booleans to perform more advanced queries on your data.
In the sample code, 2 variables are defined: my_kitchen and your_kitchen, representing areas.