Tqdm Python: A Guide With Practical Examples

tqdm is a Python library that provides a fast, extensible progress bar for loops and iterables, making it easy to visualize the progress of your code.

Sep 12, 2024 · 11 min read

Have you ever found yourself in a situation in which a long-running Python script kept you wondering if anything was happening behind the screen?

The uncertainty about the progress might lead you to cancel an almost complete run or wait endlessly for an already interrupted script run.

The tqdm Python library addresses this issue by providing progress indicators for your scripts.

What Is tqdm?

Tqdm is a Python library that provides fast, extensible progress bars for loops and iterables. It's a simple way to track the advancement of time-intensive tasks.

The library’s name means "progress" in Arabic (taqadum, تقدّم), and is an abbreviation for "I love you so much" in Spanish (te quiero demasiado).

Tdqm tracks progress and updates the progress bar display by counting iterations, calculating time elapsed as well as time remaining, and visualizing the overall progress in the bar fill.

It uses smart algorithms to predict remaining time, and skips unnecessary iteration displays to minimize overhead. Using tqdm offers several benefits, including:

Visual feedback: Progress bars enable users to see how much of a task is complete and estimate how long the remaining part might take.
Works everywhere: The tqdm library works on any platform (Linux, Windows, Mac, FreeBSD, NetBSD, SunOS), in any console or in a GUI.
Easy integration: Tqdm integrates seamlessly with Jupyter notebooks, common libraries like Pandas, and common Python constructs like loops.
Customization: It offers several options to tailor the appearance and behavior of progress bars, which we’ll get into later.
Performance: While similar packages like ProgressBar have an overhead of 800ns/iteration, tdqm’s overhead of 60ns/iterations works much faster.

Become a Data Engineer

Build Python skills to become a professional data engineer.

Get Started for Free

How to Install Tqdm

As for most Python libraries, the easiest way to install tqdm is using the pip package manager.

pip install tqdm

Tqdm: Simple Example

To create a progress bar, we wrap our iterable with the tqdm() function (which we import from the tqdm module). Let’s take a look at a simple example. The function time.sleep(0.01) serves as a placeholder for the code that is supposed to be run in each iteration.

from tqdm import tqdm
    for i in tqdm(range(1000)):
    time.sleep(0.01)
    # …

As a shortcut for tdqm(range(n)) you can also use trange(n). The following code will return exactly the same output as the previous one:

from tqdm import tqdm
    for i in trange(1000):
	time.sleep(0.01)
	# …

Both examples will return a progress bar that looks like the one above.

Take a closer look at the progress bar to see what information it provides. Let’s break down the details presented by tqdm:

Progress Indicators:

Percentage of Iterations
Bar Fill
Fraction of Total Iterations

Metrics:

Time Elapsed
Estimated Time Remaining
Performance (Iterations per Second)

Tqdm Customization: Making Progress Bars Your Own

Tqdm offers various customization options to tailor the appearance and behavior of progress bars. We will take a look at the most important customization parameters before examining more advanced use cases of tqdm.

Adding a description of the progress bar

One common customization is adding a descriptive label using the desc parameter, making the progress bar more informative and helping in keeping an overview over different iterables.

For example, adding the parameter desc=”Processing large range” would display the title “Processing large range” to the left of the progress bar.

for _ in tqdm(range(20000000), desc="Processing large range"):
	continue

I encourage you to run the code above in your environment with and without the desc parameter and notice the differences.

Specifying the total number of iterations

The total parameter specifies the total number of iterations in the loop, allowing tqdm to provide more accurate estimates of remaining time and percentage completion.

In a basic example using tqdm(range()) or trange() this is not necessary, since the number of iterations is already included in the brackets. However, there are a two main scenarios where adding the parameter total=n (with n being the total number) can be helpful:

1. Iterables without a len() method: For iterables without a len() method, such as generators with an unknown number of items, you need to provide the total value manually. Without this parameter, tqdm() will only show the count of completed iterations. Please note that the progress bar starts again, if the actual number of items surpasses the total number specified.

from tqdm import tqdm
import time
import random
 
# Function generating random number between 1 and 100
def generate_random_numbers():
	while True:
	        yield random.randint(1, 100)
 
# Without total: tqdm() only shows number of iterations, does not know total
for num in tqdm(generate_random_numbers(), desc=’Random Iteration’):
	time.sleep(0.01)
	# …

This is what it looks like if the number of items is unknown and we do not specify the total.

# With total (assuming you know the desired number of iterations): tqdm() shows progress
num_iterations = 1000
for num in tqdm(generate_random_numbers(), total=num_iterations, desc="Random Iteration"):
	time.sleep(0.01)
	# …

Much better, now the progress bar shows!

2. Manual updates: If we’re updating the progress bar manually within a function using the update() method, the total value needs to be specified to ensure correct progress tracking. The update() method enables for instance responding to dynamically changing processes or implementing custom progress tracking. We’ll come to that later.

Tweaking visual appearance

If we want to tweak the placement order in which progress bar elements appear, we can set the bar_format parameter to the preferred format string. This way, we can control the placement of various elements like the percentage, time elapsed, and bar fill character. For more details, you can refer to the documentation.

Another adjustment of the visual appearance can be done using the colour parameter. It is possible to use either word strings like ‘green’ or hexadecimal code strings like ‘#00ff00’.

The leave parameter is about disappearance rather than appearance: it determines whether or not the progress bar remains visible after its completion. If set to True, the bar will persist after the loop finishes; if set to False, it will disappear.

Let’s take a look at the visual differences in outputs in another example. The following code creates three progress bars: one with default settings, one in which the element order and color are changed, and one that is set to disappear after completion. The outputs are visible in the GIF below.

from tqdm import tqdm
import time
 
# Progress bar 1: Default settings
for i in tqdm(range(300)):
	time.sleep(0.01)
 
# Progress bar 2: Customized bar format and color
for i in tqdm(range(300), bar_format='[{elapsed}<{remaining}] {n_fmt}/{total_fmt} | {l_bar}{bar} {rate_fmt}{postfix}', colour='yellow'):
	time.sleep(0.01)
 
# Progress bar 3: Customized bar format and color, leave=False
for i in tqdm(range(300), bar_format='[{elapsed}<{remaining}] {n_fmt}/{total_fmt} | {l_bar}{bar} {rate_fmt}{postfix}', colour='red', leave=False):
	time.sleep(0.01)

Advanced Usage: Handling More Complex Scenarios

Now that we know how to build a simple progress bar and how to customize it, we can move forward to a few more advanced cases.

Nested progress bars

Nested loops are loops that are contained within other loops. Accordingly, nested progress bars are a series of progress bars for each iteration of loops contained within other loops. To create them, wrap each loop with a tqdm() function and add descriptive labels for each iterable.

The following code has three nested loops and is able to give an example of how the nested progress bars will appear:

from tqdm import trange
import time
 
for i in trange(3, desc='outer loop'):
	for j in trange(2, desc='middle loop'):
   		for k in trange(6, desc='inner loop'):
	time.sleep(0.01)

In the final output above, we can recognize a pattern in which the different progress bars are updated. Tqdm will always start with the outermost loop until it reaches the innermost loop, whose iterations will be processed and the progress bar updated accordingly.

Now let’s say we have three nested loops, just like in the example. After the initial inner iteration, it will move up to the middle loop, update it as having one iteration complete, before going back to the inner iterable. This process is repeated until the middle loop is marked as complete, which will trigger the outer bar to appear with one iteration complete.

The same happens as well between the outer and middle loop. In the end, the final inner run completes the middle iterable, which in turn completes the last outer iteration.

Manual updates

Manually updating progress bars with tqdm can be useful in several scenarios:

Iterables with unknown length: When we’re working with iterables that don't have a defined length (e.g., generators, network streams), we can manually update the progress bar based on the amount of data processed or the number of completed operations.
Dynamically changing processes: If the number of iterations or the processing time per iteration can change during the execution, manual updates allow us to adjust the progress bar accordingly.
Custom progress tracking: For more granular control over the progress bar, we can manually update it based on specific criteria or events. For example, we might want to update the progress bar based on the completion of certain milestones or the progress of individual tasks within a larger process.
Integration with external systems: If we’re integrating tqdm with external systems or libraries that don't provide a natural way to track progress, manual updates can be used to synchronize the progress bar with the external process.

To manually update a tqdm progress bar, it’s important that the total parameter is specified as the estimate of maximum expected number of iterations. Then, as the code processes each new element, the progress bar needs to be updated using the update() method. The update value should represent the number of iterations processed since the last update.

Let’s say we expect our iterable to contain up to 750 elements. In this example, the actual length is a random number between 100 and 1000, which is unknown to us. We initiate the progress_bar, setting estimated_total to 750. Then we iterate through the data, updating the progress bar after each point is processed.

from tqdm import tqdm
def process_data(data):
	time.sleep(0.01)        	# simulate processing data
	processed_data = data
	return processed_data
# Generate an iterable with random length between 100 and 1,000
random_length = random.randint(100, 1000)
data_list = [i for i in range(random_length)]
# Define estimated maximum number of iterations
estimated_total = 750 
# Define the progress bar using the estimated_total
progress_bar = tqdm(total=estimated_total)
# Iterating through data list of unknown length
for data in data_list:
	processed_data = process_data(data)
	progress_bar.update(1)

We underestimated the length of the iterable, causing the output to continue counting iterations after reaching 100% progress.

Multiprocessing

Multiprocessing and threading are techniques used to execute tasks concurrently, improving performance and responsiveness. In these scenarios, it can be challenging to track the progress of individual tasks or the overall progress of the parallel execution. Tqdm can be a valuable tool for providing visual feedback and monitoring the progress of these concurrent operations.

The tqdm.contrib.concurrent module provides specialized functions for creating progress bars in multiprocessing or threading contexts. These functions handle the synchronization and communication between the main process and the worker processes or threads, ensuring that the progress bar is updated correctly. They are designed to work seamlessly with the concurrent.futures API, using the ProcessPoolExecutor() or ThreadPoolExecutor() function.

Here's an example using the tqdm.contrib.concurrent.futures module:

import concurrent.futures
from tqdm.contrib.concurrent import process_map
 
def process_data(data):
	for i in tqdm(range(100), desc=f"Processing {data['name']}"):
	    	# Process data
	    	time.sleep(0.01)
 
if __name__ == '__main__':
	with concurrent.futures.ProcessPoolExecutor() as executor:
	    	results = process_map(process_data, [
	        	{'name': 'dataset1'},
	        	{'name': 'dataset2'},
	# …
	])

In this example, the process_data() function includes a tqdm progress bar to track its progress. The process_data() function will be executed concurrently for each data item in the list. This means that multiple progress bars will be displayed simultaneously, each representing the progress of a separate process. The desc parameter is set to dynamically create a description for each progress bar based on the name of the corresponding dataset, supporting us in distinguishing between different progress bars.

Integration with pandas

The tqdm.pandas module provides a convenient way to add progress bars to pandas operations. This is particularly useful for time-consuming operations on big DataFrames, as it provides visual feedback on the progress of the task. We can apply the tqdm.pandas() decorator to any pandas function that operates on rows or columns.

To get started, we define a random DataFrame with 100,000 rows and call the tqdm.pandas() decorator. If we want to customize the progress bar, now is the time to do it, since the progress_apply() and progress_map() functions do not take the tqdm() parameters. Here we want to give the following progress bars a name, so we also specify the desc parameter.

import pandas as pd
import numpy as np
from tqdm import tqdm
 
df = pd.DataFrame(np.random.randint(0, 10, (100000, 6)))
 
tqdm.pandas(desc='DataFrame Operation')

Now we can apply functions to rows, columns or the whole DataFrame. Instead of using one of the apply() or map() functions, call progress_apply() or progress_map(), and the progress bar will be displayed. Remember that apply() and progress_apply() can be applied to DataFrames, rows or columns, while map() and progress_map() can only be applied to Series or columns. For example:

# Halving each value in the DataFrame using progress_apply()
Result_apply = df.progress_apply(lambda x: x / 2)
 
# Doubling each element of the first column using progress_map()
result_map = df[0].progress_map(lambda x: x * 2)

Tqdm: Common Issues and Fixes

Let’s discuss some common Tqdm issues and errors and learn how to fix them.

Progress bar not updating

One of the most common issues encountered when using tqdm is a non-updating progress bar. This often occurs due to buffering issues, particularly in environments like Jupyter Notebooks. When the output is buffered, the progress bar might not be displayed or updated immediately, leading to the perception of a frozen or unresponsive process.

Using the tqdm.notebook module can address buffering issues and ensure that the progress bar updates correctly in Jupyter Notebooks. This module provides a GUI-based progress bar that is specifically designed for Jupyter environments.

In addition, it offers user-friendly color hints (blue: normal, green: completed, red: error/interrupt).

If we interrupt the code from our nested progress bar example, it looks like this:

Nested progress bars in Python using tqdm.notebook, illustrating the color scheme of completed versus interrupted bars.

Another effective way to troubleshoot non-updating progress bars is to explicitly flush the output stream. When data is written to the standard output stream (e.g., using print()), the data is typically buffered before being sent to the actual output device. Flushing the output stream forces the Python interpreter to immediately send any buffered data to the output device, ensuring that the data is displayed or written without delay.

To flush the output, use the flush() method of the standard output stream. For more responsive output, consider flushing the output stream more frequently, perhaps after every few iterations or a certain amount of time. Keep in mind that there is a trade-off, as flushing the output stream can introduce additional overhead. Here’s one example how to incorporate the method into a simple tqdm process:

import sys
import time
from tqdm import tqdm
 
for i in tqdm(range(100)):
	time.sleep(0.1)
	sys.stdout.flush()  # Flush the output stream after each iteration

Compatibility issues

While tqdm is generally compatible with most Python environments and libraries, there might be occasional compatibility issues or unexpected behaviors. Some common scenarios to be aware of include:

Custom output streams: When using custom output streams or redirecting output to files, tqdm might not function as expected. We need to ensure that the output stream we’re using supports the necessary operations for displaying the progress bar.
Third-party libraries: In some cases, tqdm might interact unexpectedly with third-party libraries, especially those that handle output or progress tracking themselves. We can try disabling or modifying the relevant third-party library features to see if it resolves the problem.
Version Compatibility: It's always a good practice to use compatible versions of tqdm and other libraries. Check the library documentation for any known compatibility issues with specific versions of Python or other dependencies.

When encountering compatibility problems, we can consider the following workarounds:

Downgrade or upgrade: We can try a different version of tqdm.
Modify code: If necessary, we can make adjustments to our code to work around any compatibility conflicts.
Seek community help: If all of that does not help, we can reach out to the tqdm community or online forums for assistance and potential solutions.

Having these potential compatibility issues and their workarounds in mind, we can effectively troubleshoot and resolve any problems encountered when using tqdm in our Python projects.

Conclusion

In conclusion, Tqdm is a Python library that provides us with progress bars and other helpful statistics, making it easier to monitor and manage code execution.

Whether you're iterating over large datasets, training machine learning models, or performing any other time-consuming operation, tqdm offers a simple yet powerful way to keep track of progress and stay informed about the status of your code.

For further exploration, feel free to check out the other DataCamp Python tutorials, the Tqdm documentation, or its source code and Readme on GitHub.