Course
Have you ever found yourself in a situation in which a long-running Python script kept you wondering if anything was happening behind the screen?
The uncertainty about the progress might lead you to cancel an almost complete run or wait endlessly for an already interrupted script run.
The tqdm
Python library addresses this issue by providing progress indicators for your scripts.
What Is tqdm?
Tqdm is a Python library that provides fast, extensible progress bars for loops and iterables. It's a simple way to track the advancement of time-intensive tasks.
The library’s name means "progress" in Arabic (taqadum, تقدّم), and is an abbreviation for "I love you so much" in Spanish (te quiero demasiado).
Tdqm tracks progress and updates the progress bar display by counting iterations, calculating time elapsed as well as time remaining, and visualizing the overall progress in the bar fill.
It uses smart algorithms to predict remaining time, and skips unnecessary iteration displays to minimize overhead. Using tqdm
offers several benefits, including:
- Visual feedback: Progress bars enable users to see how much of a task is complete and estimate how long the remaining part might take.
- Works everywhere: The
tqdm
library works on any platform (Linux, Windows, Mac, FreeBSD, NetBSD, SunOS), in any console or in a GUI. - Easy integration:
Tqdm
integrates seamlessly with Jupyter notebooks, common libraries like Pandas, and common Python constructs like loops. - Customization: It offers several options to tailor the appearance and behavior of progress bars, which we’ll get into later.
- Performance: While similar packages like ProgressBar have an overhead of 800ns/iteration, tdqm’s overhead of 60ns/iterations works much faster.
Become a Data Engineer
Build Python skills to become a professional data engineer.
How to Install Tqdm
As for most Python libraries, the easiest way to install tqdm
is using the pip
package manager.
pip install tqdm
Tqdm: Simple Example
To create a progress bar, we wrap our iterable with the tqdm()
function (which we import from the tqdm
module). Let’s take a look at a simple example. The function time.sleep(0.01)
serves as a placeholder for the code that is supposed to be run in each iteration.
from tqdm import tqdm
for i in tqdm(range(1000)):
time.sleep(0.01)
# …
As a shortcut for tdqm(range(n))
you can also use trange(n)
. The following code will return exactly the same output as the previous one:
from tqdm import tqdm
for i in trange(1000):
time.sleep(0.01)
# …
Both examples will return a progress bar that looks like the one above.
Take a closer look at the progress bar to see what information it provides. Let’s break down the details presented by tqdm
:
- Progress Indicators:
- Percentage of Iterations
- Bar Fill
- Fraction of Total Iterations
- Metrics:
- Time Elapsed
- Estimated Time Remaining
- Performance (Iterations per Second)
Tqdm Customization: Making Progress Bars Your Own
Tqdm
offers various customization options to tailor the appearance and behavior of progress bars. We will take a look at the most important customization parameters before examining more advanced use cases of tqdm
.
Adding a description of the progress bar
One common customization is adding a descriptive label using the desc
parameter, making the progress bar more informative and helping in keeping an overview over different iterables.
For example, adding the parameter desc=”Processing large range”
would display the title “Processing large range” to the left of the progress bar.
for _ in tqdm(range(20000000), desc="Processing large range"):
continue
I encourage you to run the code above in your environment with and without the desc
parameter and notice the differences.
Specifying the total number of iterations
The total
parameter specifies the total number of iterations in the loop, allowing tqdm
to provide more accurate estimates of remaining time and percentage completion.
In a basic example using tqdm(range())
or trange()
this is not necessary, since the number of iterations is already included in the brackets. However, there are a two main scenarios where adding the parameter total=n
(with n being the total number) can be helpful:
1. Iterables without a len()
method: For iterables without a len()
method, such as generators with an unknown number of items, you need to provide the total
value manually. Without this parameter, tqdm()
will only show the count of completed iterations. Please note that the progress bar starts again, if the actual number of items surpasses the total
number specified.
from tqdm import tqdm
import time
import random
# Function generating random number between 1 and 100
def generate_random_numbers():
while True:
yield random.randint(1, 100)
# Without total: tqdm() only shows number of iterations, does not know total
for num in tqdm(generate_random_numbers(), desc=’Random Iteration’):
time.sleep(0.01)
# …
This is what it looks like if the number of items is unknown and we do not specify the total.
# With total (assuming you know the desired number of iterations): tqdm() shows progress
num_iterations = 1000
for num in tqdm(generate_random_numbers(), total=num_iterations, desc="Random Iteration"):
time.sleep(0.01)
# …
Much better, now the progress bar shows!
2. Manual updates: If we’re updating the progress bar manually within a function using the update()
method, the total value needs to be specified to ensure correct progress tracking. The update()
method enables for instance responding to dynamically changing processes or implementing custom progress tracking. We’ll come to that later.
Tweaking visual appearance
If we want to tweak the placement order in which progress bar elements appear, we can set the bar_format
parameter to the preferred format string. This way, we can control the placement of various elements like the percentage, time elapsed, and bar fill character. For more details, you can refer to the documentation.
Another adjustment of the visual appearance can be done using the colour
parameter. It is possible to use either word strings like ‘green’ or hexadecimal code strings like ‘#00ff00’.
The leave
parameter is about disappearance rather than appearance: it determines whether or not the progress bar remains visible after its completion. If set to True
, the bar will persist after the loop finishes; if set to False
, it will disappear.
Let’s take a look at the visual differences in outputs in another example. The following code creates three progress bars: one with default settings, one in which the element order and color are changed, and one that is set to disappear after completion. The outputs are visible in the GIF below.
from tqdm import tqdm
import time
# Progress bar 1: Default settings
for i in tqdm(range(300)):
time.sleep(0.01)
# Progress bar 2: Customized bar format and color
for i in tqdm(range(300), bar_format='[{elapsed}<{remaining}] {n_fmt}/{total_fmt} | {l_bar}{bar} {rate_fmt}{postfix}', colour='yellow'):
time.sleep(0.01)
# Progress bar 3: Customized bar format and color, leave=False
for i in tqdm(range(300), bar_format='[{elapsed}<{remaining}] {n_fmt}/{total_fmt} | {l_bar}{bar} {rate_fmt}{postfix}', colour='red', leave=False):
time.sleep(0.01)
Advanced Usage: Handling More Complex Scenarios
Now that we know how to build a simple progress bar and how to customize it, we can move forward to a few more advanced cases.
Nested progress bars
Nested loops are loops that are contained within other loops. Accordingly, nested progress bars are a series of progress bars for each iteration of loops contained within other loops. To create them, wrap each loop with a tqdm()
function and add descriptive labels for each iterable.
The following code has three nested loops and is able to give an example of how the nested progress bars will appear:
from tqdm import trange
import time
for i in trange(3, desc='outer loop'):
for j in trange(2, desc='middle loop'):
for k in trange(6, desc='inner loop'):
time.sleep(0.01)
In the final output above, we can recognize a pattern in which the different progress bars are updated. Tqdm will always start with the outermost loop until it reaches the innermost loop, whose iterations will be processed and the progress bar updated accordingly.
Now let’s say we have three nested loops, just like in the example. After the initial inner iteration, it will move up to the middle loop, update it as having one iteration complete, before going back to the inner iterable. This process is repeated until the middle loop is marked as complete, which will trigger the outer bar to appear with one iteration complete.
The same happens as well between the outer and middle loop. In the end, the final inner run completes the middle iterable, which in turn completes the last outer iteration.
Manual updates
Manually updating progress bars with tqdm
can be useful in several scenarios:
- Iterables with unknown length: When we’re working with iterables that don't have a defined length (e.g., generators, network streams), we can manually update the progress bar based on the amount of data processed or the number of completed operations.
- Dynamically changing processes: If the number of iterations or the processing time per iteration can change during the execution, manual updates allow us to adjust the progress bar accordingly.
- Custom progress tracking: For more granular control over the progress bar, we can manually update it based on specific criteria or events. For example, we might want to update the progress bar based on the completion of certain milestones or the progress of individual tasks within a larger process.
- Integration with external systems: If we’re integrating
tqdm
with external systems or libraries that don't provide a natural way to track progress, manual updates can be used to synchronize the progress bar with the external process.
To manually update a tqdm
progress bar, it’s important that the total
parameter is specified as the estimate of maximum expected number of iterations. Then, as the code processes each new element, the progress bar needs to be updated using the update()
method. The update value should represent the number of iterations processed since the last update.
Let’s say we expect our iterable to contain up to 750 elements. In this example, the actual length is a random number between 100 and 1000, which is unknown to us. We initiate the progress_bar
, setting estimated_total
to 750. Then we iterate through the data, updating the progress bar after each point is processed.
from tqdm import tqdm
def process_data(data):
time.sleep(0.01) # simulate processing data
processed_data = data
return processed_data
# Generate an iterable with random length between 100 and 1,000
random_length = random.randint(100, 1000)
data_list = [i for i in range(random_length)]
# Define estimated maximum number of iterations
estimated_total = 750
# Define the progress bar using the estimated_total
progress_bar = tqdm(total=estimated_total)
# Iterating through data list of unknown length
for data in data_list:
processed_data = process_data(data)
progress_bar.update(1)
We underestimated the length of the iterable, causing the output to continue counting iterations after reaching 100% progress.
Multiprocessing
Multiprocessing and threading are techniques used to execute tasks concurrently, improving performance and responsiveness. In these scenarios, it can be challenging to track the progress of individual tasks or the overall progress of the parallel execution. Tqdm
can be a valuable tool for providing visual feedback and monitoring the progress of these concurrent operations.
The tqdm.contrib.concurrent
module provides specialized functions for creating progress bars in multiprocessing or threading contexts. These functions handle the synchronization and communication between the main process and the worker processes or threads, ensuring that the progress bar is updated correctly. They are designed to work seamlessly with the concurrent.futures
API, using the ProcessPoolExecutor()
or ThreadPoolExecutor()
function.
Here's an example using the tqdm.contrib.concurrent.futures module:
import concurrent.futures
from tqdm.contrib.concurrent import process_map
def process_data(data):
for i in tqdm(range(100), desc=f"Processing {data['name']}"):
# Process data
time.sleep(0.01)
if __name__ == '__main__':
with concurrent.futures.ProcessPoolExecutor() as executor:
results = process_map(process_data, [
{'name': 'dataset1'},
{'name': 'dataset2'},
# …
])
In this example, the process_data()
function includes a tqdm
progress bar to track its progress. The process_data()
function will be executed concurrently for each data item in the list. This means that multiple progress bars will be displayed simultaneously, each representing the progress of a separate process. The desc
parameter is set to dynamically create a description for each progress bar based on the name of the corresponding dataset, supporting us in distinguishing between different progress bars.
Integration with pandas
The tqdm.pandas
module provides a convenient way to add progress bars to pandas
operations. This is particularly useful for time-consuming operations on big DataFrames, as it provides visual feedback on the progress of the task. We can apply the tqdm.pandas()
decorator to any pandas
function that operates on rows or columns.
To get started, we define a random DataFrame with 100,000 rows and call the tqdm.pandas()
decorator. If we want to customize the progress bar, now is the time to do it, since the progress_apply()
and progress_map()
functions do not take the tqdm()
parameters. Here we want to give the following progress bars a name, so we also specify the desc
parameter.
import pandas as pd
import numpy as np
from tqdm import tqdm
df = pd.DataFrame(np.random.randint(0, 10, (100000, 6)))
tqdm.pandas(desc='DataFrame Operation')
Now we can apply functions to rows, columns or the whole DataFrame. Instead of using one of the apply()
or map()
functions, call progress_apply()
or progress_map()
, and the progress bar will be displayed. Remember that apply()
and progress_apply()
can be applied to DataFrames, rows or columns, while map()
and progress_map()
can only be applied to Series or columns. For example:
# Halving each value in the DataFrame using progress_apply()
Result_apply = df.progress_apply(lambda x: x / 2)
# Doubling each element of the first column using progress_map()
result_map = df[0].progress_map(lambda x: x * 2)
Tqdm: Common Issues and Fixes
Let’s discuss some common Tqdm issues and errors and learn how to fix them.
Progress bar not updating
One of the most common issues encountered when using tqdm
is a non-updating progress bar. This often occurs due to buffering issues, particularly in environments like Jupyter Notebooks. When the output is buffered, the progress bar might not be displayed or updated immediately, leading to the perception of a frozen or unresponsive process.
Using the tqdm.notebook
module can address buffering issues and ensure that the progress bar updates correctly in Jupyter Notebooks. This module provides a GUI-based progress bar that is specifically designed for Jupyter environments.
In addition, it offers user-friendly color hints (blue: normal, green: completed, red: error/interrupt).
If we interrupt the code from our nested progress bar example, it looks like this:
Nested progress bars in Python using tqdm.notebook
, illustrating the color scheme of completed versus interrupted bars.
Another effective way to troubleshoot non-updating progress bars is to explicitly flush the output stream. When data is written to the standard output stream (e.g., using print()
), the data is typically buffered before being sent to the actual output device. Flushing the output stream forces the Python interpreter to immediately send any buffered data to the output device, ensuring that the data is displayed or written without delay.
To flush the output, use the flush()
method of the standard output stream. For more responsive output, consider flushing the output stream more frequently, perhaps after every few iterations or a certain amount of time. Keep in mind that there is a trade-off, as flushing the output stream can introduce additional overhead. Here’s one example how to incorporate the method into a simple tqdm
process:
import sys
import time
from tqdm import tqdm
for i in tqdm(range(100)):
time.sleep(0.1)
sys.stdout.flush() # Flush the output stream after each iteration
Compatibility issues
While tqdm
is generally compatible with most Python environments and libraries, there might be occasional compatibility issues or unexpected behaviors. Some common scenarios to be aware of include:
- Custom output streams: When using custom output streams or redirecting output to files,
tqdm
might not function as expected. We need to ensure that the output stream we’re using supports the necessary operations for displaying the progress bar. - Third-party libraries: In some cases,
tqdm
might interact unexpectedly with third-party libraries, especially those that handle output or progress tracking themselves. We can try disabling or modifying the relevant third-party library features to see if it resolves the problem. - Version Compatibility: It's always a good practice to use compatible versions of
tqdm
and other libraries. Check the library documentation for any known compatibility issues with specific versions of Python or other dependencies.
When encountering compatibility problems, we can consider the following workarounds:
- Downgrade or upgrade: We can try a different version of
tqdm
. - Modify code: If necessary, we can make adjustments to our code to work around any compatibility conflicts.
- Seek community help: If all of that does not help, we can reach out to the
tqdm
community or online forums for assistance and potential solutions.
Having these potential compatibility issues and their workarounds in mind, we can effectively troubleshoot and resolve any problems encountered when using tqdm
in our Python projects.
Conclusion
In conclusion, Tqdm is a Python library that provides us with progress bars and other helpful statistics, making it easier to monitor and manage code execution.
Whether you're iterating over large datasets, training machine learning models, or performing any other time-consuming operation, tqdm offers a simple yet powerful way to keep track of progress and stay informed about the status of your code.
For further exploration, feel free to check out the other DataCamp Python tutorials, the Tqdm documentation, or its source code and Readme on GitHub.
FAQs
How do I create a simple progress bar with tqdm?
Install and import the tqdm
library, then wrap your iterable with the tqdm()
function.
Can I use tqdm with pandas DataFrames or other libraries?
Yes, you can use tqdm
with pandas DataFrames and other libraries. The tqdm.pandas
module provides specific functions for integrating tqdm
with pandas.
Can I customize the appearance of the Tqdm progress bar?
Yes, you can customize the bar format, color, total number of iterations, and more using Tqdm's parameters.
The Tqdm progress bar is not updating correctly, what should I do?
Check for buffering issues, especially in Jupyter Notebooks. Try using tqdm.notebook
or explicitly flushing the output. Also, ensure correct total
parameter usage.
After building a solid base in economics, law, and accounting in my dual studies at the regional financial administration, I first got into contact with statistics in my social sciences studies and work as tutor. Performing quantitative empirical analyses, I discovered a passion that led me to continue my journey further into the beautiful field of data science and learn analytics tools such as R, SQL, and Python. Currently, I am enhancing my practical skills at Deutsche Telekom, where I am able to receive lots of hands-on experience in coding data paths to import, process, and analyze data using Python.
Top Python courses!
Course
Python Toolbox
Course
Introduction to Testing in Python
cheat-sheet
Pandas Cheat Sheet for Data Science in Python
cheat-sheet
Scikit-Learn Cheat Sheet: Python Machine Learning
tutorial
Python Count Tutorial
DataCamp Team
3 min
tutorial
Python Tabulate: A Full Guide
Allan Ouko
8 min
tutorial
Python Tutorial for Beginners
tutorial