Python Data Science Toolbox (Part2)

# Start coding here...

Introduction to iterators

Iterators vs. iterables

Iterable: Lists, strings, dictionaries, files connections An object with an associated iter() method Applying iter() to an iterable creates an iterator

Iterator: Prduces next value with next()

Iterating over iterables: next()

word='Da' it = iter(word) next(it)='D' next(it)='a' next(it)=StopIteration Error (this happens when there are no more elements to iterate over)

UIterating at once with *

Prints all the values of an iterator word='Da' it = iter(word) print(it) Once we do the this, we cannot repeat it, we'll have to redefine our iterator

Iterating over dictionaries

To iterate over the key-value pairs of a Python dictionary, we need to unpack them by applying the items method to the dictionary pythonistas = {'hugo':'bowne-anderson', 'francis':'castro'} for key, value in pythonistas.items(): print(key, value)

Iterating over file connections

It returns the line from a file file = open('file.txt') it = iter(file) print(next(it))

Playing with iterators

Using enumerate()

Is a function that takes any iterable as argument and rteturns a special enumarete object, which consists of pairs containing the elements of the original iterable, along with their index within the iterable.

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] e= enumerate(avengers) print(type(e)) returns 'class 'enumerate''

We can use the function list to turn this enumerate object into a list of tuples and print it to see what it contains

e_list= list(e) print(e_list) [(0,'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]

enumerate() and unpack

The enumerate object itself is also an iterable and we can loop over it while unpacking its elements using the clause for index, value in enumerate for index, value in enumerate(avengers): print(index, value)

It's the default behaviour of enumerate to begin indexing at 0. You can alter this with the start argument
for index, value in enumerate(avengers, star=10): print(index, value)

Using zip()

Accepts an arbitrary number of iterables and returns an iterator of tuples

avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] names = ['barton', 'stark', 'odinson', 'maximoff'] z = zip(avengers, names) print(type(z)) returns 'class 'zip''

We can turn this object into a list and print the list z_list=list(z) print(z_list) [(), (), ()] The first element of the list is a tuple containing the first elements of each list that was zipped and so on

zip() and unpack

We could use a for loop to iterate over the zip object and print the tuples

for z1, z2 in zip(avengers, names): print(z1,z2)

Print zip with *

print(z) Returns all the tuples

Using iterators to load large files into memory

Loading data in chunck

There can be too much data to hold in memory. for this we load it in chunks. We perform the desired operation/s on each chunk, store the result, discard the chunk and the load the next chunk. We use pandas function read_csv(), with the argument chunksize

import pandas as pd result=[] for chunk in pd.read_csv('data.csv', chunksize=1000): result.append(sum(chunk['column_name'])) total=sum(result) print(total)

List comprehensions

A list comprehension

nums=[12,8,21,3,16] new_nums=[num+1 for num in nums]

List comprehensions

Collapse for loops for building lists into a single line Components Iterable Iterator variable Output expression

Nested loops

pairs=[(num1, num2) for num1 in range(0,2) for num2 in range(6,8)]

Exercise

Nested list comprehensions Great! At this point, you have a good grasp of the basic syntax of list comprehensions. Let's push your code-writing skills a little further. In this exercise, you will be writing a list comprehension within another list comprehension, or nested list comprehensions. It sounds a little tricky, but you can do it!

Let's step aside for a while from strings. One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:

matrix = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:

[[output expression] for iterator variable in iterable]

Note that here, the output expression is itself a list comprehension.

Instructions 0 XP In the inner list comprehension - that is, the output expression of the nested list comprehension - create a list of values from 0 to 4 using range(). Use col as the iterator variable. In the iterable part of your nested list comprehension, use range() to count 5 rows - that is, create a list of values from 0 to 4. Use row as the iterator variable; note that you won't be needing this variable to create values in the list of lists.

Create a 5 x 5 matrix using a list of lists: matrix

matrix = [[col for col in range(5)] for row in range(5)]

Print the matrix

for row in matrix: print(row)

Break Exercise

Advanced comprehensions

Conditionals in comprehensions

Conditionals on the iterable [num^2 for num in range(10) if num % 2 ==0]

Conditionals on the output expression [num^2 if num%2==0 else 0 for num in range(10)]

Dict comprehensions

Create dictionaries Use curly braces {} instead of [] pos_neg={num:-num for num in range(9)}

Introduction to generator expressions

A generator is like a list comprehension except it does not store the list in memory, it does not construct the list, but is an object we can iterate over to produce elements of the list as required. We can do all the operations that we did with lists

Generator functions

Produdces generator objects when called
Defined like a regular function -def
Yields a sequence of values instead of returning a single value
Generates a value with yield keyword

Wrapping up comprehensions and generators

List comprehensions

Basic: [output expression for iterator variable in iterable]
Advanced: [output expression + conditional on output for iterator variable in iterable + conditional on iterable]

EXERCISE

Using a list comprehension This time, you're going to use the lists2dict() function you defined in the last exercise to turn a bunch of lists into a list of dictionaries with the help of a list comprehension.

The lists2dict() function has already been preloaded, together with a couple of lists, feature_names and row_lists. feature_names contains the header names of the World Bank dataset and row_lists is a list of lists, where each sublist is a list of actual values of a row from the dataset.

Your goal is to use a list comprehension to generate a list of dicts, where the keys are the header names and the values are the row entries.

Instructions 0 XP Inspect the contents of row_lists by printing the first two lists in row_lists. Create a list comprehension that generates a dictionary using lists2dict() for each sublist in row_lists. The keys are from the feature_names list and the values are the row entries in row_lists. Use sublist as your iterator variable and assign the resulting list of dictionaries to list_of_dicts. Look at the first two dictionaries in list_of_dicts by printing them out.

Print the first two lists in row_lists

print(row_lists[0]) print(row_lists[1])

Turn list of lists into list of dicts: list_of_dicts

list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]

Print the first two dictionaries in list_of_dicts

print(list_of_dicts[0]) print(list_of_dicts[1])

END OF EXERCISE

EXERCISE

Writing an iterator to load data in chunks (3) You're getting used to reading and processing data in chunks by now. Let's push your skills a little further by adding a column to a DataFrame.

Starting from the code of the previous exercise, you will be using a list comprehension to create the values for a new column 'Total Urban Population' from the list of tuples that you generated earlier. Recall from the previous exercise that the first and second elements of each tuple consist of, respectively, values from the columns 'Total Population' and 'Urban population (% of total)'. The values in this new column 'Total Urban Population', therefore, are the product of the first and second element in each tuple. Furthermore, because the 2nd element is a percentage, you need to divide the entire result by 100, or alternatively, multiply it by 0.01.

You will also plot the data from this new column to create a visualization of the urban population data.

The packages pandas and matplotlib.pyplot have been imported as pd and plt respectively for your use.

Instructions 0 XP Instructions 0 XP Write a list comprehension to generate a list of values from pops_list for the new column 'Total Urban Population'. The output expression should be the product of the first and second element in each tuple in pops_list. Because the 2nd element is a percentage, you also need to either multiply the result by 0.01 or divide it by 100. In addition, note that the column 'Total Urban Population' should only be able to take on integer values. To ensure this, make sure you cast the output expression to an integer with int(). Create a scatter plot where the x-axis are values from the 'Year' column and the y-axis are values from the 'Total Urban Population' column.

Code from previous exercise

urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) df_urb_pop = next(urb_pop_reader) df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] pops = zip(df_pop_ceb['Total Population'], df_pop_ceb['Urban population (% of total)']) pops_list = list(pops)

Use list comprehension to create new DataFrame column 'Total Urban Population'

df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list]

Plot urban population data

df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population') plt.show()

END OF EXERCISE

EXERCISE

Writing an iterator to load data in chunks (4) In the previous exercises, you've only processed the data from the first DataFrame chunk. This time, you will aggregate the results over all the DataFrame chunks in the dataset. This basically means you will be processing the entire dataset now. This is neat because you're going to be able to process the entire large dataset by just working on smaller pieces of it!

You're going to use the data from 'ind_pop_data.csv', available in your current directory. The packages pandas and matplotlib.pyplot have been imported as pd and plt respectively for your use.

Instructions 0 XP Initialize an empty DataFrame data using pd.DataFrame(). In the for loop, iterate over urb_pop_reader to be able to process all the DataFrame chunks in the dataset. Concatenate data and df_pop_ceb by passing a list of the DataFrames to pd.concat().

Initialize reader object: urb_pop_reader

urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)

Initialize empty DataFrame: data

data = pd.DataFrame()

Iterate over each DataFrame chunk

for df_urb_pop in urb_pop_reader:

# Check out specific country: df_pop_ceb
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']

# Zip DataFrame columns of interest: pops
pops = zip(df_pop_ceb['Total Population'],
            df_pop_ceb['Urban population (% of total)'])

# Turn zip object into list: pops_list
pops_list = list(pops)

# Use list comprehension to create new DataFrame column 'Total Urban Population'
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list]

# Concatenate DataFrame chunk to the end of data: data
data = pd.concat([data, df_pop_ceb])

Plot urban population data

data.plot(kind='scatter', x='Year', y='Total Urban Population') plt.show()

Python Data Science Toolbox (Part2)

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Introduction to iterators

Iterators vs. iterables

Iterating over iterables: next()

UIterating at once with *

Iterating over dictionaries

Iterating over file connections

Playing with iterators

Using enumerate()

enumerate() and unpack

Using zip()

zip() and unpack

Print zip with *

Using iterators to load large files into memory

Loading data in chunck

List comprehensions

A list comprehension

List comprehensions

Nested loops

Exercise

Create a 5 x 5 matrix using a list of lists: matrix

Print the matrix

Break Exercise

Advanced comprehensions

Conditionals in comprehensions

Dict comprehensions

Introduction to generator expressions

Generator functions

Wrapping up comprehensions and generators

List comprehensions

EXERCISE

END OF EXERCISE

EXERCISE

END OF EXERCISE

EXERCISE

END OF EXERCISE

Introduction to iterators