Course
Counting things sounds trivial until results look off by a little—or a lot. I’ve seen frequency checks done with repeated list.count() calls or misused pandas methods that silently skip missing values. The outcome: slow code and wrong numbers. The fix is to use the right tool for the job: collections.Counter for frequency maps, built-in count() for quick single-value queries, and pandas’ value_counts() when you’re in a DataFrame.
What Is Counter and How It Works
collections.Counter is a dictionary subclass for tallying hashable objects (a multiset). You can build a Counter from an iterable or a mapping. Missing keys default to 0; counts can be incremented or decremented; and several convenience methods make common tasks straightforward.
from collections import Counter
# From an iterable (one pass over the data)
nyc_eatery_types = [
"Mobile Food Truck", "Food Cart", "Snack Bar", "Restaurant", "Food Cart",
"Restaurant", "Mobile Food Truck", "Snack Bar", "Mobile Food Truck"
]
eatery_type_counts = Counter(nyc_eatery_types)
print(eatery_type_counts) # dict-like view
print(eatery_type_counts["Restaurant"]) # missing keys return 0 instead of KeyError
print(eatery_type_counts["Kiosk"])
Counter({'Mobile Food Truck': 3, 'Food Cart': 2, 'Snack Bar': 2, 'Restaurant': 2})
2
0
Do not rely on the iteration order of a Counter. When you need a ranked view, use most_common().
Find the Most Common Values with most_common()
Counter.most_common(n) returns a list of (item, count) pairs in descending count order. Provide n to limit the result, or omit it to get all pairs.
top_3 = eatery_type_counts.most_common(3)
print(top_3)
[('Mobile Food Truck', 3), ('Food Cart', 2), ('Snack Bar', 2)]
Use this when you need frequency analytics, leaderboards, or quick sanity checks on categorical data.
Do More with Counter
update and subtract counts
Business data changes. update() adds to counts, while subtract() reduces them. Both accept iterables or mappings.
new_permits = ["Restaurant", "Food Cart", "Restaurant"]
eatery_type_counts.update(new_permits) # add 1 for each occurrence
print(eatery_type_counts)
closures = {"Snack Bar": 1}
eatery_type_counts.subtract(closures) # reduce counts (can go negative)
print(eatery_type_counts)
Counter({'Mobile Food Truck': 3, 'Restaurant': 4, 'Food Cart': 3, 'Snack Bar': 2})
Counter({'Restaurant': 4, 'Mobile Food Truck': 3, 'Food Cart': 3, 'Snack Bar': 1})
If a count drops to 0 or below, methods like elements() and most_common() ignore it. To remove a key entirely, use del counter[key].
compute totals and top-N
Starting in Python 3.10, Counter.total() returns the sum of all counts. This is helpful when you need proportions or shares.
total_permits = eatery_type_counts.total()
for eatery, count in eatery_type_counts.most_common(3):
share = count / total_permits
print(f"{eatery}: {count} ({share:.1%})")
rebuild sequences with elements()
elements() yields each element as many times as its count. The order is arbitrary.
expanded = list(eatery_type_counts.elements())
print(len(expanded), "items reconstructed from counts")
use math and set-style operations
You can combine counters with arithmetic and min/max style operations. Results drop zero and negative counts.
from collections import Counter
a = Counter({"Food Cart": 5, "Restaurant": 2})
b = Counter({"Food Cart": 3, "Snack Bar": 4})
print(a + b) # add counts
print(a - b) # subtract (keeps positives only)
print(a & b) # intersection: min of counts
print(a | b) # union: max of counts
Counter({'Food Cart': 8, 'Snack Bar': 4, 'Restaurant': 2})
Counter({'Food Cart': 2, 'Restaurant': 2})
Counter({'Food Cart': 3})
Counter({'Food Cart': 5, 'Snack Bar': 4, 'Restaurant': 2})
When to Use list.count(), Counter, or pandas
Pick the simplest tool that meets your requirements and performance constraints.
single-value checks with count()
Use built-in count() when you only need to know how many times one value occurs in a sequence or a non-overlapping substring count in a string.
numbers = [1, 2, 2, 3, 2]
print(numbers.count(2)) # 3
word = "banana"
print(word.count("a")) # 3
text = "aaa"
print(text.count("aa")) # 1: non-overlapping matches only
print("cat".count("")) # 4: empty string counts len(s)+1 positions
Be careful with booleans and integers: True == 1 and False == 0. This can inflate counts if you mix types.
mixed = [1, True, 0, False, True]
print(mixed.count(1)) # 3 (counts 1 and True)
print(mixed.count(0)) # 2 (counts 0 and False)
Avoid calling count() repeatedly inside loops on large data; each call scans the entire sequence.
# Inefficient: O(n^2) for large lists
# freq = {x: numbers.count(x) for x in numbers}
# Efficient: one pass
from collections import Counter
freq = Counter(numbers)
multiple frequencies with Counter
When you need counts for many unique values, build a Counter once and query it as needed. It’s clear, fast, and designed for this use case.
tabular data with pandas
In pandas, DataFrame.count() and Series.count() compute non-missing counts. To get a frequency table of values, use Series.value_counts().
import pandas as pd
df = pd.DataFrame({
"line": ["A", "A", "B", None, "B", "B"],
"ridership": [100, None, 120, 130, None, 150],
})
print(df["line"].count()) # 5 (non-missing)
print(df["line"].value_counts()) # value frequencies
# For full row count, use len(df), not df.count()
print(len(df)) # 6
Validate and clean data with Counter
Before analysis, I check categorical consistency, unexpected values, and duplicates. Counter makes these checks quick and repeatable.
from collections import Counter
# Example: validate allowed categories
allowed_types = {"Mobile Food Truck", "Food Cart", "Snack Bar", "Restaurant"}
type_counts = Counter(nyc_eatery_types)
unexpected = {t: c for t, c in type_counts.items() if t not in allowed_types}
if unexpected:
print("Unexpected categories found:", unexpected)
# Example: flag duplicates (e.g., station IDs appearing more than once)
station_ids = ["ST-001", "ST-002", "ST-003", "ST-002", "ST-004", "ST-002"]
dup_counts = Counter(station_ids)
duplicates = [sid for sid, c in dup_counts.items() if c > 1]
print("Duplicate station IDs:", duplicates)
Practical notes and common pitfalls
These specifics save time and prevent subtle bugs.
Counteris dict-like: missing keys return 0; usedel counter[key]to remove entries. Avoid storing zeros unless you need them temporarily.
- Do not rely on iteration order. Use
most_common()for ranked output.
str.count()counts non-overlapping matches. For overlapping substring counts, usere.finditer().
- Avoid repeated
list.count()calls on the same large sequence. Build oneCounterinstead.
- In pandas,
.count()is for non-missing counts; use.value_counts()for value frequencies.
Minimal patterns I rely on
These are the snippets I use most often in production and teaching.
- One-pass frequency map:
Counter(iterable)
- Top-N categories:
Counter(data).most_common(n)
- Incremental updates:
counter.update(batch)orcounter.subtract(batch)
- Single-value check:
sequence.count(x)(avoid in loops)
- Tabular frequencies:
df["col"].value_counts()(notdf.count())
Conclusion
collections.Counter is the right tool for counting multiple values quickly and clearly. Use built-in count() for one-off checks, Counter for frequency maps and top-N analysis, and Series.value_counts() when you’re working in pandas. Keep an eye on string-counting edge cases, boolean–integer equivalence, and the difference between non-missing counts and value frequencies. With these patterns, your counts will be both fast and correct.
