Intermediate Python
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv", index_col=0)Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
import matplotlib.pylot AS plt
plt.plot(year, pop) -- if you pass only one argument, will use index for x and the values for y axis
plt.show()
plt.hist(values, bins=n) - histogram -- If bins is an integer, it defines the number of equal-width bins in the range. If bins is a sequence, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin plt.clf() - clears so you can create another plot
plt.xlabel('text here')
plt.ylabel('text here')
plt.title('Title text here')
plt.yticks([0, 2, 4, 6, 8, 10], ['0', '2B', '4B', '6B', '8B', '10B']) - second argument is display names of ticks and should be same length as first list
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8)
plt.grid(True) - gridlines
DICTIONARIES
world = {"afghanistan":30.55, "albania":2.77, "algeria":39.21} <-- creating dictionary
world["albania"] <-- this will return the population of albania
Dictionary keys must be unique so if you added "albania":2.81 to the end of the above dictionary and then call the dictionary only one value for Albania will show and it will be 2.81
Dictionary keys must be immutable (unchangeable) so cannot be a list
To add or update information to a dictionary do this: world["sealand"] = 0.000027
"sealand" in world --- will return boolean
del(world["sealand"])
use lists when order matters -- dictionaries are like lookup tables
dictionaries can contain other dictionaries -- use additional square brackets europe['spain']['population']
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
'france': { 'capital':'paris', 'population':66.03 },
'germany': { 'capital':'berlin', 'population':80.62 },
'norway': { 'capital':'oslo', 'population':5.084 } }
print(europe['spain']['population'])PANDAS
pandas
- high level data manipulation tool
- built off numpy
import pandas as pd
you can create a DataFrame from a dictionary --- df = pd.DataFrame(europe)
can add an index -- df.index = ["SP", "FR", "DE", "NO"]
brics = pd.read_csv("file path here", index_col = 0)
select a column = brics["country"] --- returns a 1D labelled pandas array --- can use double brackets instead brics[["country", "capital"]]
pandas
- loc - label based
- iloc - integer position based
brics.loc[["RU", "CH", "IN"]] - select rows as dataframe
brics.loc[:,["country","capital"]]
LOGIC, CONTROL FLOW & FILTERING
for string comparison, Python determines the relationship based on alphabetical order
use classic 'and' 'or' 'not' operators
simple boolean operators work with numpy arrays but not 'and' 'or' 'not' so you need to use: np.logical_and(), np.logical_or(), np.logical_not()
control flow = if, elif, else
room = "bed"
if room == "kit" :
print("looking around in the kitchen.")
elif room == "bed":
print("looking around in the bedroom.")
else :
print("looking around elsewhere.")Filtering Pandas dataframes
extract series and use to subset dataframe
# is_huge = brics["area"] > 8
# brics[is_huge]
brics[brics["area"] > 8]can also use numpy boolean operators to subset e.g.
brics[np.logical_and(brics["area"] > 8, brics["area"] < 10)]LOOPS
while loop - not common, will continue to execute code until condition is not true