Intermediate Python
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv")Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets hereExplore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Create a loop that iterates through the
bricsDataFrame and prints "The population of {country} is {population} million!". - Create a histogram of the life expectancies for countries in Africa in the
gapminderDataFrame. Make sure your plot has a title, axis labels, and has an appropriate number of bins. - Simulate 10 rolls of two six-sided dice. If the two dice add up to 7 or 11, print "A win!". If the two dice add up to 2, 3, or 12, print "A loss!". If the two dice add up to any other number, print "Roll again!".
Notes - Intermediate Python
Basic plots with Matplotlib -visualization -data structure -control structures -case study
Data Visualization
- explore data set
- report insights
- Matplotlib
- import matplotlib.pyplot as plt
- plt.plot or plt.scatter(year,pop)
- plt.show()
Label plots plt.xlabel() plt.ylabel() plt.title() plt.yticks([list]) can add to the lists that are used to generate the x and y axes
---DICTIONARIES--- Curly brackets {} - key/value pairs i.e, world = {afghanistan":30.55,"albania":2.77,"algeria":39.21} world["albania"] -> returns 2.77
To print keys, print(dictionaryname.keys()) The last pair you add to the curly brackets is kept if there are duplicates
Keys have to be "immutable" objects - cannot be changed after they're created; strings, boolean, integers, and floats. Lists are mutable; can't use a list as a key.
world["sealand"] = 0.000027 ^^ adds to world
"sealand" in world --> gives boolean true/false as answer
If you do world["sealand"] = 0.000028 -> updates value, doesn't add new pair
del(world(["sealand"]))
Gets rid of the pair
List is a sequence of values indexed by range of numbers Dictionary is indexed by unique keys List: collection of values, order matters, for selecting entire subsets Dictionary is good for a lookup table.
Pandas
- high level data manipulation tool, built upon NumPy
- can build a dataframe (table) from dictionary
- import pandas as pd
- index_col = 0 tells pandas that the first column = the name
pd.read_csv("file path") imports CSV file data
Data access methods loc - select data based on labels iloc - select data based on positions
[[]] <-- pulls query of dataframe AS dataframe
Comparison Operators How do Python values relate? (<, <=, >, >=, ==) </> alphabetically works (carl before chris = carl < chris) Can't compare different types (boolean to string, string to number.) Integers CAN be compared with floats tho != is not equal to
Logical operators: And - all are true Or - at least one is true Not - not True = False, not False = True
Arrays NumPy: use logical_and() logical_or() logical_not() variable[np.logical_and(x > y, bmi < y)]
If, elif, else if condition : <-- needs the colon expression <-- must be indented continuing code here is known to python to not be part of the if statement elif condition : <-- another if only when the 1st is false, before resorting to else else : <-- when the if is not met expression <--must be indented
Filtering pandas DataFrames
while loop the while loop will continue to execute the code as long as the condition is true
while condition : expression
for loop for variable in sequence : expression
loop data structures Dictionaries for key, val in my_dict.items() : expression
NumPy arrays for val in np.nditer(my_array) : expression
DON'T FORGET YOUR COLONS
Dataframes for val/s in dataframe.iterrows() : expression
Example of apply() in pandas DataFrame
import pandas as pd brics = pd.read_csv("brics.csv", index_col = 0) brics["name_length"] = brics["country"].apply(len) print(brics)
Adds new column with the length of each country name without a for loop!
Random Generators import numpy as np np.random.rand() - pseudo-random numbers
import numpy as np
x = np.array([[9,2,2],
[9,8,7]])
for i in np.nditer(x):
print(i)