Skip to content

Python Data Science Toolbox (Part 2)

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import pandas as pd
import matplotlib.pyplot as plt

# Import the course datasets 
world_ind = pd.read_csv('datasets/world_ind_pop_data.csv')
tweets = pd.read_csv('datasets/tweets.csv')

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Nested list comprehensions and generators

General structure:

basic: result = [output expression for iterator variable in interable]

advanced: result = [output expression + conditional on output for iterator variable in iterable + conditional on iterable]

Syntax for dictionaries: result = {output expression for iterator variable in interable}

Syntax for generators: result = (output expression for iterator variable in interable)


# Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(0,5)] for row in range(0,5)]

# Print the matrix
for row in matrix:
    print(row)

# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship selecting only members with names above 7 characters long
new_fellowship = [member for member in fellowship if len(member)>=7]

# Print the new list
print(new_fellowship)
# add else (notice that the sequence of the comprehension statements changes)

# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create list comprehension: new_fellowship
new_fellowship = [member if len(member)>=7 else '' for member in fellowship ]

# Print the new list
print(new_fellowship)
#Dictionary comprehension

# Create a list of strings: fellowship
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']

# Create dict comprehension: new_fellowship
new_fellowship = {member : len(member) for member in fellowship}

# Print the new dictionary
print(new_fellowship)

Generator expressions uses comprehension syntax!

However, a generator does not create the entire list and writes it to memory But you can still do all the other things that you can do with list comprehensions Very useful when working with large data

Can write generator functions which are very powerful


# Create a list of strings: lannister
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Create a generator object: lengths
lengths = (len(person) for person in lannister)

# Iterate over and print the values in lengths
for value in lengths:
    print(value)
#Generator function

# Create a list of strings
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

# Define generator function get_lengths
def get_lengths(input_list):
    """Generator function that yields the
    length of the strings in input_list."""

    # Yield the length of a string
    for person in input_list:
        yield len(person)

# Print the values generated by get_lengths()
for value in get_lengths(lannister):
    print(value)

Making a function that writes lists into dictionaries

# Define lists2dict()
def lists2dict(list1, list2):
    """Return a dictionary where list1 provides
    the keys and list2 provides the values."""

    # Zip lists: zipped_lists
    zipped_lists = zip(list1, list2)

    # Create a dictionary: rs_dict
    rs_dict = dict(zipped_lists)

    # Return the dictionary
    return(rs_dict)

# Call lists2dict: rs_fxn
rs_fxn = lists2dict(feature_names, row_vals)

# Print rs_fxn
print(rs_fxn)
# Import the pandas package
import pandas as pd

# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

# Turn list of dicts into a DataFrame: df
df = pd.DataFrame(list_of_dicts)

# Print the head of the DataFrame
print(df.head())

Create a context manager to a dataset