# Start coding here...
Introduction to iterators
Iterators vs. iterables
Iterable: Lists, strings, dictionaries, files connections An object with an associated iter() method Applying iter() to an iterable creates an iterator
Iterator: Prduces next value with next()
Iterating over iterables: next()
word='Da' it = iter(word) next(it)='D' next(it)='a' next(it)=StopIteration Error (this happens when there are no more elements to iterate over)
UIterating at once with *
Prints all the values of an iterator word='Da' it = iter(word) print(it) Once we do the this, we cannot repeat it, we'll have to redefine our iterator
Iterating over dictionaries
To iterate over the key-value pairs of a Python dictionary, we need to unpack them by applying the items method to the dictionary pythonistas = {'hugo':'bowne-anderson', 'francis':'castro'} for key, value in pythonistas.items(): print(key, value)
Iterating over file connections
It returns the line from a file file = open('file.txt') it = iter(file) print(next(it))
Playing with iterators
Using enumerate()
Is a function that takes any iterable as argument and rteturns a special enumarete object, which consists of pairs containing the elements of the original iterable, along with their index within the iterable.
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] e= enumerate(avengers) print(type(e)) returns 'class 'enumerate''
We can use the function list to turn this enumerate object into a list of tuples and print it to see what it contains
e_list= list(e) print(e_list) [(0,'hawkeye'), (1, 'iron man'), (2, 'thor'), (3, 'quicksilver')]
enumerate() and unpack
The enumerate object itself is also an iterable and we can loop over it while unpacking its elements using the clause for index, value in enumerate for index, value in enumerate(avengers): print(index, value)
It's the default behaviour of enumerate to begin indexing at 0. You can alter this with the start argument
for index, value in enumerate(avengers, star=10):
print(index, value)
Using zip()
Accepts an arbitrary number of iterables and returns an iterator of tuples
avengers = ['hawkeye', 'iron man', 'thor', 'quicksilver'] names = ['barton', 'stark', 'odinson', 'maximoff'] z = zip(avengers, names) print(type(z)) returns 'class 'zip''
We can turn this object into a list and print the list z_list=list(z) print(z_list) [(), (), ()] The first element of the list is a tuple containing the first elements of each list that was zipped and so on
zip() and unpack
We could use a for loop to iterate over the zip object and print the tuples
for z1, z2 in zip(avengers, names): print(z1,z2)
Print zip with *
print(z) Returns all the tuples
Using iterators to load large files into memory
Loading data in chunck
There can be too much data to hold in memory. for this we load it in chunks. We perform the desired operation/s on each chunk, store the result, discard the chunk and the load the next chunk. We use pandas function read_csv(), with the argument chunksize
import pandas as pd result=[] for chunk in pd.read_csv('data.csv', chunksize=1000): result.append(sum(chunk['column_name'])) total=sum(result) print(total)
List comprehensions
A list comprehension
nums=[12,8,21,3,16] new_nums=[num+1 for num in nums]
List comprehensions
Collapse for loops for building lists into a single line Components Iterable Iterator variable Output expression
Nested loops
pairs=[(num1, num2) for num1 in range(0,2) for num2 in range(6,8)]
Exercise
Nested list comprehensions Great! At this point, you have a good grasp of the basic syntax of list comprehensions. Let's push your code-writing skills a little further. In this exercise, you will be writing a list comprehension within another list comprehension, or nested list comprehensions. It sounds a little tricky, but you can do it!
Let's step aside for a while from strings. One of the ways in which lists can be used are in representing multi-dimension objects such as matrices. Matrices can be represented as a list of lists in Python. For example a 5 x 5 matrix with values 0 to 4 in each row can be written as:
matrix = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] Your task is to recreate this matrix by using nested listed comprehensions. Recall that you can create one of the rows of the matrix with a single list comprehension. To create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension:
[[output expression] for iterator variable in iterable]
Note that here, the output expression is itself a list comprehension.
Instructions 0 XP In the inner list comprehension - that is, the output expression of the nested list comprehension - create a list of values from 0 to 4 using range(). Use col as the iterator variable. In the iterable part of your nested list comprehension, use range() to count 5 rows - that is, create a list of values from 0 to 4. Use row as the iterator variable; note that you won't be needing this variable to create values in the list of lists.
Create a 5 x 5 matrix using a list of lists: matrix
matrix = [[col for col in range(5)] for row in range(5)]
Print the matrix
for row in matrix: print(row)
Break Exercise
Advanced comprehensions
Conditionals in comprehensions
Conditionals on the iterable [num^2 for num in range(10) if num % 2 ==0]
Conditionals on the output expression [num^2 if num%2==0 else 0 for num in range(10)]
Dict comprehensions
Create dictionaries Use curly braces {} instead of [] pos_neg={num:-num for num in range(9)}
Introduction to generator expressions
A generator is like a list comprehension except it does not store the list in memory, it does not construct the list, but is an object we can iterate over to produce elements of the list as required. We can do all the operations that we did with lists
Generator functions
- Produdces generator objects when called
- Defined like a regular function -def
- Yields a sequence of values instead of returning a single value
- Generates a value with yield keyword
Wrapping up comprehensions and generators
List comprehensions
- Basic: [output expression for iterator variable in iterable]
- Advanced: [output expression + conditional on output for iterator variable in iterable + conditional on iterable]
EXERCISE
Using a list comprehension This time, you're going to use the lists2dict() function you defined in the last exercise to turn a bunch of lists into a list of dictionaries with the help of a list comprehension.
The lists2dict() function has already been preloaded, together with a couple of lists, feature_names and row_lists. feature_names contains the header names of the World Bank dataset and row_lists is a list of lists, where each sublist is a list of actual values of a row from the dataset.
Your goal is to use a list comprehension to generate a list of dicts, where the keys are the header names and the values are the row entries.
Instructions 0 XP Inspect the contents of row_lists by printing the first two lists in row_lists. Create a list comprehension that generates a dictionary using lists2dict() for each sublist in row_lists. The keys are from the feature_names list and the values are the row entries in row_lists. Use sublist as your iterator variable and assign the resulting list of dictionaries to list_of_dicts. Look at the first two dictionaries in list_of_dicts by printing them out.
Print the first two lists in row_lists
print(row_lists[0]) print(row_lists[1])
Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
Print the first two dictionaries in list_of_dicts
print(list_of_dicts[0]) print(list_of_dicts[1])
END OF EXERCISE
EXERCISE
Writing an iterator to load data in chunks (3) You're getting used to reading and processing data in chunks by now. Let's push your skills a little further by adding a column to a DataFrame.
Starting from the code of the previous exercise, you will be using a list comprehension to create the values for a new column 'Total Urban Population' from the list of tuples that you generated earlier. Recall from the previous exercise that the first and second elements of each tuple consist of, respectively, values from the columns 'Total Population' and 'Urban population (% of total)'. The values in this new column 'Total Urban Population', therefore, are the product of the first and second element in each tuple. Furthermore, because the 2nd element is a percentage, you need to divide the entire result by 100, or alternatively, multiply it by 0.01.
You will also plot the data from this new column to create a visualization of the urban population data.
The packages pandas and matplotlib.pyplot have been imported as pd and plt respectively for your use.
Instructions 0 XP Instructions 0 XP Write a list comprehension to generate a list of values from pops_list for the new column 'Total Urban Population'. The output expression should be the product of the first and second element in each tuple in pops_list. Because the 2nd element is a percentage, you also need to either multiply the result by 0.01 or divide it by 100. In addition, note that the column 'Total Urban Population' should only be able to take on integer values. To ensure this, make sure you cast the output expression to an integer with int(). Create a scatter plot where the x-axis are values from the 'Year' column and the y-axis are values from the 'Total Urban Population' column.
Code from previous exercise
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) df_urb_pop = next(urb_pop_reader) df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] pops = zip(df_pop_ceb['Total Population'], df_pop_ceb['Urban population (% of total)']) pops_list = list(pops)
Use list comprehension to create new DataFrame column 'Total Urban Population'
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list]
Plot urban population data
df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population') plt.show()
END OF EXERCISE
EXERCISE
Writing an iterator to load data in chunks (4) In the previous exercises, you've only processed the data from the first DataFrame chunk. This time, you will aggregate the results over all the DataFrame chunks in the dataset. This basically means you will be processing the entire dataset now. This is neat because you're going to be able to process the entire large dataset by just working on smaller pieces of it!
You're going to use the data from 'ind_pop_data.csv', available in your current directory. The packages pandas and matplotlib.pyplot have been imported as pd and plt respectively for your use.
Instructions 0 XP Initialize an empty DataFrame data using pd.DataFrame(). In the for loop, iterate over urb_pop_reader to be able to process all the DataFrame chunks in the dataset. Concatenate data and df_pop_ceb by passing a list of the DataFrames to pd.concat().
Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
Initialize empty DataFrame: data
data = pd.DataFrame()
Iterate over each DataFrame chunk
for df_urb_pop in urb_pop_reader:
# Check out specific country: df_pop_ceb df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] # Zip DataFrame columns of interest: pops pops = zip(df_pop_ceb['Total Population'], df_pop_ceb['Urban population (% of total)']) # Turn zip object into list: pops_list pops_list = list(pops) # Use list comprehension to create new DataFrame column 'Total Urban Population' df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list] # Concatenate DataFrame chunk to the end of data: data data = pd.concat([data, df_pop_ceb]) Plot urban population data
data.plot(kind='scatter', x='Year', y='Total Urban Population') plt.show()