Tutorials
learning data science
+1

Lists: N-Sized Chunks

In this tutorial, you shall work with lists and learn an efficient way to divide arbitrarily sized lists into chunks of a given size.

Lists are inbuilt data structures in Python that store heterogeneous items and enable efficient access to these items. The task at hand, dividing lists into N-sized chunks is a widespread practice when there is a limit to the number of items your program can handle in a single request.

Lists

Lists are data structures that can hold mixed values or items within itself. Examples of items are integers, floats, strings, etc. Lists are mutable, which means you can change the content of a list without actually changing its identity. They are written with square brackets [ ], and the items within it are separated by a comma (,). Let's create a list of numbers that we can work with...

# Creating a list of 95 numbers
list_numbers = list(range(1, 96))
print(list_numbers)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]


The range() function generates a list of numbers. It takes the form: range([start], stop[, step]) where 'start' and 'step' are optional parameters. start is the starting number in the sequence, stop is the number it generates numbers up to, but not including this number. step is the difference between each number in the sequence. This function is 0 indexed, meaning the list indexes start at 0, not 1 when the start is not specified.

Now, let's say you need to break down the list into smaller lists, each having 5 elements each.

One way to do this is by defining a generator. A generator is an elegant way to define an iterator. What is an iterator you ask? To put into simple words - iterator is an object that knows how to compute and return the next item in the object that you are iterating through. You can read more about iterators and generators in DataCamp's Python Iterator Tutorial.

We define a function that holds the generator that actually does the work for us.

# Yields successive 'n' sized chunks from list 'list_name'
def create_chunks(list_name, n):
for i in range(0, len(list_name), n):
yield list_name[i:i + n]

# Call the 'create_chunks' function to divide the list further into sub-lists of 10 items each
print(list(create_chunks(list_numbers, 10)))

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95]]


Another way to do the same is to merely use list comprehension. You can read more about it in DataCamp's Python List Comprehension Tutorial.

print([list_numbers[i: i+10] for i in range(0, len(list_numbers), 10)])

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48, 49, 50], [51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [61, 62, 63, 64, 65, 66, 67, 68, 69, 70], [71, 72, 73, 74, 75, 76, 77, 78, 79, 80], [81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95]]


As the problem gets more complicated - list comprehension statements can get more and more complex to understand and debug. Thus, writing a clean function such as with the generator can be more useful and easier to keep track of.

In this tutorial, you have learned two ways to solve a rather frequent problem when dealing with lists. Check out DataCamp's Data Types for Data Science course.