Skip to main content
HomeTutorialsPython

A Comprehensive Guide to Using pathlib in Python For File System Manipulation

Discover advantages of Python pathlib over the os module by exploring path objects, path components, and common path operations for file system interaction.
May 22, 2024  · 9 min read

Until recently, file system manipulation in Python was notoriously difficult. Developers often struggled with incorrect file paths, which were prone to errors because they required long strings as inputs. Also, developers reported that their code broke frequently due to inconsistencies across different operating systems.

Luckily, in Python version 3.4, developers introduced the pathlib module to the standard library. pathlib provides an elegant solution to handling file system paths using a long-awaited object-oriented approach, and it also ensures platform-agnostic behavior.

This comprehensive tutorial will teach you the features of the pathlib module to support your daily interactions with your file system. Using pathlib, you will benefit from efficient workflows and easy data retrieval. pathlib has matured significantly over the years, and we’ve kept up-to-date so you don’t have to. Let's get started.

Python os Module vs. pathlib

Prior to Python 3.4, the more traditional way of handling file paths was by using the os module. While the os module was once highly effective, it has started to show its age.

We can show the unique value of pathlib by considering a common task in data science: how to find all png files inside a given directory and all its sub-directories.

If we were using the os module, we might write the following code:

import os
dir_path = "/home/user/documents"
files = [
os.path.join(dir_path, f)
for f in os.listdir(dir_path)
if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".png")
]

Although this code solves the immediate task of finding our png files, it reveals several major disadvantages of the os module. For one thing, the code is long and almost unreadable, which is a shame, considering this is a relatively simple operation. As a second point, our code assumes knowledge of list comprehensions, which shouldn’t be taken for granted. As a third point, the code involves string operations, which are error-prone. And on top of all this, the code isn't very concise. 

If, instead, we were using the pathlib module, our code would be much simpler. As we have mentioned, pathlib provides an object-oriented approach to handling file system paths. Take a look:

from pathlib import Path

# Create a path object
dir_path = Path(dir_path)

# Find all text files inside a directory
files = list(dir_path.glob("*.png"))

This object-oriented programming organizes code around objects and their interactions, leading to more modular, reusable, and maintainable code. If you are unfamiliar with object-oriented programming, it’s worth learning with our Object-Oriented Python Programming course

Working with Path Objects in Python

The pathlib library revolves around what are called Path objects, which represent file system paths in a structured and   platform-independent way.

Earlier in this tutorial, we brought the Path class from the pathlib module into our current namespace using the following line of code:

from pathlib import Path

After calling the Path class from pathlib, we can create Path objects in several ways, including from strings, from other Path objects, from the current working directory, and from the home directory.

Let’s take a look at each one in turn.

Creating path objects from strings

We can create a Path object by passing a string representing a file system path to a variable. This converts the string representation of the file path into a Path object.

file_path_str = "data/union_data.csv"

data_path = Path(file_path_str)

Creating path objects from other path objects

Existing Path objects can serve as building blocks for creating new paths. 

We do this by combining a base path, a data directory, and a file name into a single file path. We have to remember to use a forward slash where appropriate to extend our Path objects.

base_path = Path("/home/user")
data_dir = Path("data")

# Combining multiple paths
file_path = base_path / data_dir / "prices.csv"  
print(file_path)
'/home/user/data/prices.csv'

Creating path objects from the current working directory

Here we assign the current working directory to the cwd variable using the Path.cwd() method. We can then retrieve the path of the current working directory where our script is running. 

cwd = Path.cwd()

print(cwd)
'/home/bexgboost/articles/2024/4_april/8_pathlib'

Creating path objects from the home working directory

We can construct a path by combining our home directory with additional subdirectories. Here we are combining our home directory with the subdirectories "downloads" and "projects." 

home = Path.home()

home / "downloads" / "projects"
PosixPath('/home/bexgboost/downloads/projects')

An important note: The Path class itself doesn't perform any file system operations such as path validation, creating directories or files. It is designed for representing and manipulating paths. To actually interact with the file system (checking existence, reading/writing files), we will have to use special methods of Path objects and for some advanced cases, get help from the os module.

Working with Path Components in Python

File path attributes are various properties and components of a file path that help in identifying and managing files and directories within a file system. Just like a physical address has different parts, such as street number, city, country, and zip code, a file system path can be broken down into smaller components. pathlib allows us to access and manipulate these components using path attributes through dot notation.

Working with the root directory

The root is the topmost directory in a file system. In Unix-like systems, it is represented by a forward slash (/). In Windows, it is typically a drive letter followed by a colon, like C:.

image_file = home / "downloads" / "midjourney.png"

image_file.root
'/'

Working with the parent directory

The parent contains the current file or directory. it is one level higher relative to the current directory or file. 

image_file.parent
PosixPath('/home/bexgboost/downloads')

Working with the file name

This attribute returns the entire file name, including the extension, as a string.

image_file.name
'midjourney.png'

Working with the file suffix

The suffix attribute returns the file extension, including the dot, as a string (or an empty string if there’s no extension).

image_file.suffix
'.png'

Working with the file stem

The stem returns the file name without the extension. Working with the stem can be useful when converting files to different formats.

image_file.stem
'midjourney'

Note: On a Mac, file paths are case-sensitive, so /Users/username/Documents and /users/username/documents would be different. 

The pathlib parts attribute

We can use the .parts attribute to split a Path object into its components.

image_file.parts
('/', 'home', 'bexgboost', 'downloads', 'midjourney.png')

The pathlib parents attribute

The parents attribute, which returns a generator, turns these components into Path objects. 

list(image_file.parents)
[PosixPath('/home/bexgboost/downloads'),
PosixPath('/home/bexgboost'),
PosixPath('/home'),
PosixPath('/')]

Common Path Operations Using pathlib

Path objects have many methods that allow you to interact efficiently with directories and their contents. Let's take a look at how to perform some of the most common operations.

Listing directories

The iterdir() method allows you to iterate over all the files and subdirectories in a folder. It is particularly useful for processing all files in a directory or performing operations on each entry.

cwd = Path.cwd()
 for entry in cwd.iterdir():
# Process the entry here
...
# print(entry)

Since iterdir() returns an iterator, entries are retrieved on-demand as you go through the loop.

The is_dir() method

The is_dir() method returns True if the path points to a directory, False otherwise.

for entry in cwd.iterdir():
   if entry.is_dir():
      print(entry.name)
.ipynb_checkpoints
data
images

The is_file() method

The .is_file() method returns True if the path points to a file, False otherwise.

for entry in cwd.iterdir():
   if entry.is_file():
      print(entry.suffix)
.ipynb
.txt

The exists() method

Since Path objects only represent paths, sometimes you need to check if a path exists using the .exists() method:

The .exists() method check if a path exists. This is useful because Path objects can represent files and directories that may or may not actually be present in the file system.

image_file.exists()
False

Creating and deleting paths

pathlib also offers functionalities for creating and deleting files and directories. Let's see how.

The mkdir() method

The mkdir() method creates a new directory at the specified path. By default, it creates the directory in the current working directory.

from pathlib import Path

data_dir = Path("new_data_dir")

# Create the directory 'new_data_dir' in the current working directory
data_dir.mkdir()

The mkdir(parents=True) method

The mkdir(parents=True) method is particularly useful when you want to create a directory structure where some parent directories might not exist. Setting parents=True ensures that all necessary parent directories are created along the way.

sub_dir = Path("data/nested/subdirectory")

# Create 'data/nested/subdirectory', even if 'data' or 'nested' don't exist
sub_dir.mkdir(parents=True)

Keep in mind that mkdir() raises an exception if a directory with the same name already exists.

Path('data').mkdir()
FileExistsError: [Errno 17] File exists: 'data'

The unlink() method

The unlink() method permanently deletes a file represented by the Path object. It is recommended to check if a file exists before running this method in order to avoid receiving an error.

to_delete = Path("data/prices.csv")

if to_delete.exists():
   to_delete.unlink()
   print(f"Successfully deleted {to_delete.name}")
Successfully deleted prices.csv

The rmdir() method

The rmdir() method removes an empty directory. Remember that rmdir() only works to remove empty directories. The easiest way to delete a non-empty directory is to use shutil library or the terminal.

empty_dir = Path("new_data_dir")

empty_dir.rmdir()

Note: Please be cautious when using unlink() or rmdir() because their results are permanent.

Advanced Path Manipulation

Let’s move on to some advanced path manipulation concepts and how to apply them in practice using pathlib.

Relative vs. absolute paths

We will start by understanding the differences between absolute and relative paths, as they come up often.

Relative paths

Relative paths specify the location of a file or directory relative to the current directory, hence the word relative. They are short and flexible within your project but can be confusing if you change the working directory.

For example, I have an images folder in my current working directory, which has the midjourney.png file.

image = Path("images/midjourney.png")

image
PosixPath('images/midjourney.png')

The above code works now, but if I move the notebook I am using to a different location, the snippet will break because the images folder didn't move with the notebook.

Absolute paths

Absolute paths specify the full location of a file or a directory from the root of the file system. They are independent of the current directory and offer a clear reference point for any user anywhere on the system.

image_absolute = Path("/home/bexgboost/articles/2024/4_april/8_pathlib/images/midjourney.png")

image_absolute
PosixPath('/home/bexgboost/articles/2024/4_april/8_pathlib/images/midjourney.png')

As you can see, absolute paths can be pretty long, especially in complex projects with nested tree structures. For this reason, most people prefer relative paths which are shorter.

Resolve method

pathlib provides methods to convert relative paths to absolute with the resolve() method.

relative_image = Path("images/midjourney.png")

absolute_image = relative_image.resolve()

absolute_image
PosixPath('/home/bexgboost/articles/2024/4_april/8_pathlib/images/midjourney.png')

We can also go the other way: If we have an absolute path, we can convert it to a relative path based on a reference directory.

relative_path = Path.cwd()

absolute_image.relative_to(relative_path)
PosixPath('images/midjourney.png')

Globbing

In order to illustrate globbing, we can turn back to the example we introduced at the beginning of the article, where we wrote code to find all the png files in a given directory.

files = list(dir_path.glob("*.png"))

pathlib uses the built-in .glob() module to efficiently search for files matching a specific pattern in any directory. This module is very useful when processing files with similar names or extensions.

The glob method works by accepting a pattern string containing wildcards as input and it returns a generator object that yields matching Path objects on demand:

  • *: Matches zero or more characters.

  • ?: Matches any single character.

  • []: Matches a range of characters enclosed within brackets (e.g., [a-z] matches any lowercase letter).

To illustrate, let’s try to find all Jupyter notebooks in my articles directory.

articles_dir = Path.home() / "articles"

# Find all scripts
notebooks = articles_dir.glob("*.ipynb")

# Print how many found
print(len(list(notebooks)))
0

The .glob() method didn't find any notebooks, which at first glance seems surprising because I have written over 150 articles. The reason is that .glob() only searches inside the given directory, not its subdirectories.

We can solve this by doing a recursive search, for which we need to use the rglob() method, which has similar syntax:

notebooks = articles_dir.rglob("*.ipynb")

print(len(list(notebooks)))
357

This time, our code found all 357 files.

Working with files

As we have seen, Path objects only represent files but don't perform operations on them. However, they do have certain methods for common file operations. We will see how to use them in this section.

Reading files

Reading file contents is a fundamental operation in many Python applications. pathlib provides convenient shorthand methods for reading files as either text or raw bytes.

The read_text() method allows us to read the contents of a text file and close the file.

file = Path("file.txt")

print(file.read_text())
This is sample text.

For binary files, we can use the read_bytes() method instead.

image = Path("images/midjourney.png")

image.read_bytes()[:10]
b'\x89PNG\r\n\x1a\n\x00\x00'

Remember, when using a read_* method, error handling is important:

nonexistent_file = Path("gibberish.txt")

try:
   contents = nonexistent_file.read_text()
except FileNotFoundError:
   print("No such thing.")
No such thing.

Writing files

Writing to files is as easy as reading files. To write files, we have the write_text() method.

file = Path("file.txt")

file.write_text("This is new text.")
17
file.read_text()
'This is new text.'

As we can see, the write_text() method overwrites text. Although there is no append mode for write_text(), we can use read_text() and write_text() together to append text to the end of the file.

old_text = file.read_text() + "\n"
final_text = "This is the final text."

# Combine old and new texts and write them back
file.write_text(old_text + final_text)

print(file.read_text())
This is new text.

This is the final text.

write_bytes() works in a similar way. To illustrate, let's first duplicate the midjourney.png image with a new name.

original_image = Path("images/midjourney.png")

new_image = original_image.with_stem("duplicated_midjourney")

new_image
PosixPath('images/duplicated_midjourney.png')

The with_stem() method returns a file path with a different filename (although the suffix stays the same). This lets us read an original image and write its context to a new image. 

new_image.write_bytes(original_image.read_bytes())
1979612

File renaming and moving

In addition to the with_stem() function to rename a file's stem, pathlib offers the rename() method to rename more completely. 

file = Path("file.txt")

target_path = Path("new_file.txt")

file.rename(target_path)
PosixPath('new_file.txt')

rename() accepts a target path, which can be a string or another path object.

To move files, you can use the replace() function, which also accepts a destination path:

# Define the file to be moved
source_file = Path("new_file.txt")

# Define the location to put the file
destination = Path("data/new/location")

# Create the directories if they don't exist
destination.mkdir(parents=True)

# Move the file
source_file.replace(destination / source_file.name)
PosixPath('data/new/location/new_file.txt')

Creating blank files

pathlib allows us to create blank files using the touch method:

# Define new file path
new_dataset = Path("data/new.csv")

new_dataset.exists()
False
new_dataset.touch()

new_dataset.exists()
True

The touch method is originally meant for updating a file's modification time, so it can be used on existing files as well.

original_image.touch()

When you need to reserve a filename for later use but don’t have any content to write to it at the moment, we can use touch to create a blank. The method was inspired by the Unix touch terminal command.

Permissions and file system information

As a final item, we will learn how to access file characteristics using the .stat() method. If you are familiar with os, you will notice that this new method has the same output as os.stat().

image_stats = original_image.stat()

image_stats
os.stat_result(st_mode=33188, st_ino=1950175, st_dev=2080, st_nlink=1, st_uid=1000, st_gid=1000, st_size=1979612, st_atime=1714664562, st_mtime=1714664562, st_ctime=1714664562)

We can also retrieve the file size using dot notation.

image_size = image_stats.st_size

# File size in megabytes
image_size / (1024**2)
1.8879051208496094

Conclusion

The introduction of the pathlib module in Python 3.4 has significantly simplified file system manipulation for developers. By providing an object-oriented approach to handling file paths, pathlib provides a structured and straightforward way to represent file system paths. pathlib also offers platform independence, meaning pathlib handles path separators consistently across different operating systems so our code doesn't break on a new machine. Finally, pathlib offers a vast set of concise and expressive methods for common file system operations, as we have seen. 

Keep in mind, pathlib represents one of many powerful built-in libraries in Python. By taking our Data Scientist With Python Career TrackPython Programming Skill Track, and Intro to Python for Data Science courses, you will master a wide set of built-in libraries to become a strong Python programmer. 

Thank you for reading!



Photo of Bex Tuychiev
Author
Bex Tuychiev
LinkedIn

I am a data science content creator with over 2 years of experience and one of the largest followings on Medium. I like to write detailed articles on AI and ML with a bit of a sarcastıc style because you've got to do something to make them a bit less dull. I have produced over 130 articles and a DataCamp course to boot, with another one in the makıng. My content has been seen by over 5 million pairs of eyes, 20k of whom became followers on both Medium and LinkedIn. 

Topics

Learn Python with DataCamp

Track

Python Programming

19hrs hr
Level-up your programming skills. Learn how to optimize code, write functions and tests, and use best-practice software engineering techniques.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

How to Get the Current Directory in Python

To get the current working directory in Python, you can use either os.getcwd() from the os module or pathlib.Path.cwd() from the pathlib module.
Stephen Gruppetta's photo

Stephen Gruppetta

tutorial

Python Tutorial for Beginners

Get a step-by-step guide on how to install Python and use it for basic data science functions.
Matthew Przybyla's photo

Matthew Przybyla

12 min

tutorial

Python Setup: The Definitive Guide

In this tutorial, you'll learn how to set up your computer for Python development, and explain the basics for having the best application lifecycle.

J. Andrés Pizarro

15 min

tutorial

Working with Modules in Python

Modules enable you to split parts of your program in different files for easier maintenance and better performance.

Nishant Kumar

8 min

tutorial

How to Delete a File in Python

File management is a crucial aspect of code handling. Part of this skill set is knowing how to delete a file. In this tutorial, we cover multiple ways to delete a file in Python, along with best practices in doing so.
Amberle McKee's photo

Amberle McKee

5 min

tutorial

A Comprehensive Guide on How to Line Break in Python

Learn how to create a line break for a string in Python and create proper indentation using backslashes, parentheses, and other delimiters.
Amberle McKee's photo

Amberle McKee

7 min

See MoreSee More