Skip to content

Intermediate Python

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import the two datasets
gapminder = pd.read_csv("datasets/gapminder.csv")
brics = pd.read_csv("datasets/brics.csv")
hitters = pd.read_csv("datasets/hitters.csv")

Data Analysis

for Loop

for each var, a variable, in seq, a sequence, execute the expression _for var in seq : expression_ _Loop over string. The for loop doesn't only work with lists. You can also create a or loop that iterates over every character a string
## Example of the for Loop in action
fam = [1.73, 1.68, 1.71, 1.89]
for height in fam :
    print(height)
    
dash =("---")
print(dash)
## what if you want to print the index of the list with the height.  to achieve this use the enumerate()
fam = [1.73, 1.68, 1.71, 1.89]
for index, height in enumerate(fam) :
    print("index" + str(index) + ": " + str(height))

dash =("---")
print(dash)
##Loop over string
for c in "family" :
    print(c.capitalize())
   
##More examples
dash =("---")
print(dash)
# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Code the for loop
for area in areas :
    print(area)
dash =("---")
print(dash)

# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Change for loop to use enumerate() and update print()
for index, area in enumerate(areas) :
    print("room" + str(index) + ": " + str(area))
# Starting the room #s @ 1
# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Code the for loop
for index, area in enumerate(areas, start = 1) :
    print("room" + str(index) + ": " + str(area) + " sqm")
                                                 
dash ="---"
print(dash)
##Another loop using the indexes of a 2d list to print
# house list of lists
house = [["hallway", 11.25], 
         ["kitchen", 18.0], 
         ["living room", 20.0], 
         ["bedroom", 10.75], 
         ["bathroom", 9.50]]
         
# Build a for loop from scratch
for x in house :
   print("the " + x[0] + " is " + str(x[1]) + " sqm")

Loop Data Structures

Dictionaries

Dictionaries, Numpy arrays Defining the sequence will differ depending on the data structure if you want to iterate over key-value pairs in a dictionary, use the items() method on the dictionary to define the sequence in the for loop. If you want to iterate over all elements in a NumPy array, you should use the nditer() function to specify the sequence. Pay attention here: dictionaries require a method, NumPy arrays use a function.

make sure to use items() method on sequence in the for Loop for var in seq.method() : expression

Working with dictionaries

# Create a list of dictionaries with new data avocados_list = [ {"date": "2019-11-03", "small_sold": 10376832, "large_sold": 7835071}, {"date": "2019-11-10", "small_sold": 10717154, "large_sold": 8561348}, ] # Convert list into DataFrame avocados_2019 = pd.DataFrame(avocados_list) # Print the new DataFrame print(avocados_2019) # Create a dictionary of lists with new data avocados_dict = { "date": ["2019-11-17", "2019-12-01"], "small_sold": ["10859987", "9291631"], "large_sold": ["7674135", "6238096"] } # Convert dictionary into DataFrame avocados_2019 = pd.DataFrame(avocados_dict) # Print the new DataFrame print(avocados_2019) # From previous step airline_bumping = pd.read_csv("airline_bumping.csv") print(airline_bumping.head()) # For each airline, select nb_bumped and total_passengers and sum airline_totals = airline_bumping.groupby("airline")[["nb_bumped", "total_passengers"]].sum()

2 hidden cells

Loop Data Structures Part 2

Panda Data Frames

In Pandas, you have to mention explicitly that you want to iterate over the rows. You do this by calling the iterrows method on the brics country, thus specifying another "sequence": The iterrows method looks at the DataFrame, and on each iteration generates two pieces of data: the label of the row and then the actual data in the row as a Pandas Series

Hidden code

Random Numbers Assignment

this is a project Chapter Random Generators inside numpy their is the random package containing the rand package np.random.rand() Psuedo random is when you specify the seed np.random.seed(123) coin - np.random.randint(0,2) # Randomly generate 0 or 1 print(coin)


1 hidden cell

Random Walk

A Random Walk is a succession of multiple random steps This concept is well know in science for example- the path traced by a molecule as it travels in a liquid or a gas can be modeled as a random walk. THe financial status of a gambler is another status To record every step in your random walk, you need to learn how to gradually build a list with a for loop


1 hidden cell

Distribution of Random Walks

You go up some steps you go down some steps You still have to answer the question ditributions.py

np.random.seed(123)


3 hidden cells

Inspecting a DataFrame

When you get a new DataFrame to work with, the first thing you need to do is explore it and see what it contains. There are several useful methods and attributes for this.

.head() returns the first few rows (the “head” of the DataFrame). .info() shows information on each of the columns, such as the data type and number of missing values. .shape returns the number of rows and columns of the DataFrame. .describe() calculates a few summary statistics for each column.

To better understand DataFrame objects, it's useful to know that they consist of three components, stored as attributes:

.values: A two-dimensional NumPy array of values. .columns: An index of columns: the column names. .index: An index for the rows: either row numbers or row names.

You can usually think of indexes as a list of strings or numbers, though the pandas Index data type allows for more sophisticated options. (These will be covered later in the course.)