Skip to content

Intermediate Python

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Create a loop that iterates through the brics DataFrame and prints "The population of {country} is {population} million!".
  • Create a histogram of the life expectancies for countries in Africa in the gapminder DataFrame. Make sure your plot has a title, axis labels, and has an appropriate number of bins.
  • Simulate 10 rolls of two six-sided dice. If the two dice add up to 7 or 11, print "A win!". If the two dice add up to 2, 3, or 12, print "A loss!". If the two dice add up to any other number, print "Roll again!".

Notes - Intermediate Python

Basic plots with Matplotlib -visualization -data structure -control structures -case study

Data Visualization

  • explore data set
  • report insights
  • Matplotlib
  • import matplotlib.pyplot as plt
  • plt.plot or plt.scatter(year,pop)
  • plt.show()

Label plots plt.xlabel() plt.ylabel() plt.title() plt.yticks([list]) can add to the lists that are used to generate the x and y axes

---DICTIONARIES--- Curly brackets {} - key/value pairs i.e, world = {afghanistan":30.55,"albania":2.77,"algeria":39.21} world["albania"] -> returns 2.77

To print keys, print(dictionaryname.keys()) The last pair you add to the curly brackets is kept if there are duplicates

Keys have to be "immutable" objects - cannot be changed after they're created; strings, boolean, integers, and floats. Lists are mutable; can't use a list as a key.

world["sealand"] = 0.000027 ^^ adds to world

"sealand" in world --> gives boolean true/false as answer

If you do world["sealand"] = 0.000028 -> updates value, doesn't add new pair

del(world(["sealand"]))

Gets rid of the pair

List is a sequence of values indexed by range of numbers Dictionary is indexed by unique keys List: collection of values, order matters, for selecting entire subsets Dictionary is good for a lookup table.

Pandas

  • high level data manipulation tool, built upon NumPy
  • can build a dataframe (table) from dictionary
  • import pandas as pd
  • index_col = 0 tells pandas that the first column = the name

pd.read_csv("file path") imports CSV file data

Data access methods loc - select data based on labels iloc - select data based on positions

[[]] <-- pulls query of dataframe AS dataframe

Comparison Operators How do Python values relate? (<, <=, >, >=, ==) </> alphabetically works (carl before chris = carl < chris) Can't compare different types (boolean to string, string to number.) Integers CAN be compared with floats tho != is not equal to

Logical operators: And - all are true Or - at least one is true Not - not True = False, not False = True

Arrays NumPy: use logical_and() logical_or() logical_not() variable[np.logical_and(x > y, bmi < y)]

If, elif, else if condition : <-- needs the colon expression <-- must be indented continuing code here is known to python to not be part of the if statement elif condition : <-- another if only when the 1st is false, before resorting to else else : <-- when the if is not met expression <--must be indented

Filtering pandas DataFrames

while loop the while loop will continue to execute the code as long as the condition is true

while condition : expression

for loop for variable in sequence : expression

loop data structures Dictionaries for key, val in my_dict.items() : expression

NumPy arrays for val in np.nditer(my_array) : expression

DON'T FORGET YOUR COLONS

Dataframes for val/s in dataframe.iterrows() : expression

Example of apply() in pandas DataFrame

import pandas as pd brics = pd.read_csv("brics.csv", index_col = 0) brics["name_length"] = brics["country"].apply(len) print(brics)

Adds new column with the length of each country name without a for loop!

Random Generators import numpy as np np.random.rand() - pseudo-random numbers

import numpy as np
x = np.array([[9,2,2],
             [9,8,7]])
for i in np.nditer(x):
    print(i)