Skip to content

Intermediate Python

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv", index_col=0)

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

import matplotlib.pylot AS plt

plt.plot(year, pop) -- if you pass only one argument, will use index for x and the values for y axis

plt.show()

plt.hist(values, bins=n) - histogram -- If bins is an integer, it defines the number of equal-width bins in the range. If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin plt.clf() - clears so you can create another plot

plt.xlabel('text here')

plt.ylabel('text here')

plt.title('Title text here')

plt.yticks([0, 2, 4, 6, 8, 10], ['0', '2B', '4B', '6B', '8B', '10B']) - second argument is display names of ticks and should be same length as first list

plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)

plt.grid(True) - gridlines

DICTIONARIES

world = {"afghanistan":30.55, "albania":2.77, "algeria":39.21} <-- creating dictionary

world["albania"] <-- this will return the population of albania

Dictionary keys must be unique so if you added "albania":2.81 to the end of the above dictionary and then call the dictionary only one value for Albania will show and it will be 2.81

Dictionary keys must be immutable (unchangeable) so cannot be a list

To add or update information to a dictionary do this: world["sealand"] = 0.000027

"sealand" in world --- will return boolean

del(world["sealand"])

use lists when order matters -- dictionaries are like lookup tables

dictionaries can contain other dictionaries -- use additional square brackets europe['spain']['population']

europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }

print(europe['spain']['population'])

PANDAS

pandas

  • high level data manipulation tool
  • built off numpy

import pandas as pd

you can create a DataFrame from a dictionary --- df = pd.DataFrame(europe)

can add an index -- df.index = ["SP", "FR", "DE", "NO"]

brics = pd.read_csv("file path here", index_col = 0)

select a column = brics["country"] --- returns a 1D labelled pandas array --- can use double brackets instead brics[["country", "capital"]]

pandas

  • loc - label based
  • iloc - integer position based

brics.loc[["RU", "CH", "IN"]] - select rows as dataframe

brics.loc[:,["country","capital"]]

LOGIC, CONTROL FLOW & FILTERING

for string comparison, Python determines the relationship based on alphabetical order

use classic 'and' 'or' 'not' operators

simple boolean operators work with numpy arrays but not 'and' 'or' 'not' so you need to use: np.logical_and(), np.logical_or(), np.logical_not()

control flow = if, elif, else

room = "bed"

if room == "kit" :
    print("looking around in the kitchen.")
elif room == "bed":
    print("looking around in the bedroom.")
else :
    print("looking around elsewhere.")

Filtering Pandas dataframes

extract series and use to subset dataframe

# is_huge = brics["area"] > 8
# brics[is_huge]

brics[brics["area"] > 8]

can also use numpy boolean operators to subset e.g.

brics[np.logical_and(brics["area"] > 8, brics["area"] < 10)]

LOOPS

while loop - not common, will continue to execute code until condition is not true