Tutorials

When working with data, you are often going to need to count items, create dictionaries values before you know keys to store them in, or maintain order in a dictionary.

Counter is a powerful tool for counting, validating, and learning more about the elements within a dataset that is found in the collections module. You pass an iterable (list, set, tuple) or a dictionary to the Counter. You can also use the Counter object similarly to a dictionary with key/value assignment, for example, counter[key] = value.

A common usage for Counter is checking data for consistency prior to using it.

Counter Module

The counter module is based on dictionary; you can use all of the normal dictionary features. In this example, we have the list named nyc_eatery_types that contains one column of data called type from a table about eateries in NYC parks. We create a new counter based on that list and print it.

from collections import Counter
nyc_eatery_count_by_types = Counter(nyc_eatery_types)
print(nyc_eatery_count_by_types)
Counter({'Mobile Food Truck': 114, 'Food Cart': 74, 'Snack Bar': 24,
'Specialty Cart': 18, 'Restaurant': 15, 'Fruit & Vegetable Cart': 4})

You can see each type from the list and the number of times it was found in the list. We can also see how many restaurants are in the counter by using Restaurant as the index and printing it.

print(nyc_eatery_count_by_types['Restaurant'])
15

Using Counter to Find the Most Common

Counters also provide a wonderful way to find the most common values they contain. The most_common() method on a Counter returns a list of tuples containing the items and their count in descending order.

Let's print the top 3 eatery types in the NYC park system with the most_common() method and pass it 3 as the number items to return.

print(nyc_eatery_count_by_types.most_common(3))
[('Mobile Food Truck': 114), ('Food Cart': 74), ('Snack Bar': 24)]

most_common() is excellent for frequency analytics and finding out how often an item occurs.

Interactive Example

In the following example, you will:

  • Use the data from the Chicago Transit Authority on ridership.
  • Import the Counter object from collections.
  • Print the first ten items from the stations list.
  • Create a Counter of the stations list called station_count.
  • Print the station_count.
# Import the Counter object
from collections import Counter

# Print the first ten items from the stations list
print(stations[:10])

# Create a Counter of the stations list: station_count
station_count = Counter(stations)

# Print the station_count
print(station_count)

When we run the above code, it produces the following result:

['stationname', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park']
Counter({'California-Cermak': 700, 'Damen-Cermak': 700, 'Ashland-Orange': 700, 'Argyle': 700, 'Halsted-Orange': 700, 'Laramie': 700, 'Diversey': 700, '79th': 700, 'Clinton-Lake': 700, 'Monroe/Dearborn': 700, 'Wellington': 700, 'Merchandise Mart': 700, 'Cicero-Cermak': 700, 'Kedzie-Lake': 700, 'Southport': 700, 'Washington/Wells': 700, 'Clark/Division': 700, 'stationname': 1})

Try it for yourself.

To learn more about the collections module for counting, please see this video from our course Data Types for Data Science in Python.

This content is taken from DataCamp’s Data Types for Data Science in Python course by Jason Myers.