Track
Until recently, file system manipulation in Python was notoriously difficult. Developers often struggled with incorrect file paths, which were prone to errors because they required long strings as inputs. Also, developers reported that their code broke frequently due to inconsistencies across different operating systems.
Luckily, in Python version 3.4, developers introduced the pathlib
module to the standard library. pathlib
provides an elegant solution to handling file system paths using a long-awaited object-oriented approach, and it also ensures platform-agnostic behavior.
This comprehensive tutorial will teach you the features of the pathlib
module to support your daily interactions with your file system. Using pathlib
, you will benefit from efficient workflows and easy data retrieval. pathlib
has matured significantly over the years, and we’ve kept up-to-date so you don’t have to. Let's get started.
Python os Module vs. pathlib
Prior to Python 3.4, the more traditional way of handling file paths was by using the os
module. While the os
module was once highly effective, it has started to show its age.
We can show the unique value of pathlib
by considering a common task in data science: how to find all png
files inside a given directory and all its sub-directories.
If we were using the os
module, we might write the following code:
import os
dir_path = "/home/user/documents"
files = [
os.path.join(dir_path, f)
for f in os.listdir(dir_path)
if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".png")
]
Although this code solves the immediate task of finding our png
files, it reveals several major disadvantages of the os
module. For one thing, the code is long and almost unreadable, which is a shame, considering this is a relatively simple operation. As a second point, our code assumes knowledge of list comprehensions, which shouldn’t be taken for granted. As a third point, the code involves string operations, which are error-prone. And on top of all this, the code isn't very concise.
If, instead, we were using the pathlib module, our code would be much simpler. As we have mentioned, pathlib
provides an object-oriented approach to handling file system paths. Take a look:
from pathlib import Path
# Create a path object
dir_path = Path(dir_path)
# Find all text files inside a directory
files = list(dir_path.glob("*.png"))
This object-oriented programming organizes code around objects and their interactions, leading to more modular, reusable, and maintainable code. If you are unfamiliar with object-oriented programming, it’s worth learning with our Object-Oriented Python Programming course.
Working with Path Objects in Python
The pathlib
library revolves around what are called Path
objects, which represent file system paths in a structured and platform-independent way.
Earlier in this tutorial, we brought the Path
class from the pathlib
module into our current namespace using the following line of code:
from pathlib import Path
After calling the Path
class from pathlib
, we can create Path
objects in several ways, including from strings, from other Path
objects, from the current working directory, and from the home directory.
Let’s take a look at each one in turn.
Creating path objects from strings
We can create a Path
object by passing a string representing a file system path to a variable. This converts the string representation of the file path into a Path
object.
file_path_str = "data/union_data.csv"
data_path = Path(file_path_str)
Creating path objects from other path objects
Existing Path
objects can serve as building blocks for creating new paths.
We do this by combining a base path, a data directory, and a file name into a single file path. We have to remember to use a forward slash where appropriate to extend our Path
objects.
base_path = Path("/home/user")
data_dir = Path("data")
# Combining multiple paths
file_path = base_path / data_dir / "prices.csv"
print(file_path)
'/home/user/data/prices.csv'
Creating path objects from the current working directory
Here we assign the current working directory to the cwd
variable using the Path.cwd()
method. We can then retrieve the path of the current working directory where our script is running.
cwd = Path.cwd()
print(cwd)
'/home/bexgboost/articles/2024/4_april/8_pathlib'
Creating path objects from the home working directory
We can construct a path by combining our home directory with additional subdirectories. Here we are combining our home directory with the subdirectories "downloads" and "projects."
home = Path.home()
home / "downloads" / "projects"
PosixPath('/home/bexgboost/downloads/projects')
An important note: The Path
class itself doesn't perform any file system operations such as path validation, creating directories or files. It is designed for representing and manipulating paths. To actually interact with the file system (checking existence, reading/writing files), we will have to use special methods of Path
objects and for some advanced cases, get help from the os
module.
Working with Path Components in Python
File path attributes are various properties and components of a file path that help in identifying and managing files and directories within a file system. Just like a physical address has different parts, such as street number, city, country, and zip code, a file system path can be broken down into smaller components. pathlib
allows us to access and manipulate these components using path attributes through dot notation.
Working with the root directory
The root is the topmost directory in a file system. In Unix-like systems, it is represented by a forward slash (/
). In Windows, it is typically a drive letter followed by a colon, like C:
.
image_file = home / "downloads" / "midjourney.png"
image_file.root
'/'
Working with the parent directory
The parent contains the current file or directory. it is one level higher relative to the current directory or file.
image_file.parent
PosixPath('/home/bexgboost/downloads')
Working with the file name
This attribute returns the entire file name, including the extension, as a string.
image_file.name
'midjourney.png'
Working with the file suffix
The suffix attribute returns the file extension, including the dot, as a string (or an empty string if there’s no extension).
image_file.suffix
'.png'
Working with the file stem
The stem returns the file name without the extension. Working with the stem can be useful when converting files to different formats.
image_file.stem
'midjourney'
Note: On a Mac, file paths are case-sensitive, so /Users/username/Documents
and /users/username/documents
would be different.
The pathlib parts attribute
We can use the .parts
attribute to split a Path
object into its components.
image_file.parts
('/', 'home', 'bexgboost', 'downloads', 'midjourney.png')
The pathlib parents attribute
The parents
attribute, which returns a generator, turns these components into Path
objects.
list(image_file.parents)
[PosixPath('/home/bexgboost/downloads'),
PosixPath('/home/bexgboost'),
PosixPath('/home'),
PosixPath('/')]
Common Path Operations Using pathlib
Path
objects have many methods that allow you to interact efficiently with directories and their contents. Let's take a look at how to perform some of the most common operations.
Listing directories
The iterdir()
method allows you to iterate over all the files and subdirectories in a folder. It is particularly useful for processing all files in a directory or performing operations on each entry.
cwd = Path.cwd()
for entry in cwd.iterdir():
# Process the entry here
...
# print(entry)
Since iterdir()
returns an iterator, entries are retrieved on-demand as you go through the loop.
The is_dir() method
The is_dir()
method returns True
if the path points to a directory, False
otherwise.
for entry in cwd.iterdir():
if entry.is_dir():
print(entry.name)
.ipynb_checkpoints
data
images
The is_file() method
The .is_file()
method returns True
if the path points to a file, False
otherwise.
for entry in cwd.iterdir():
if entry.is_file():
print(entry.suffix)
.ipynb
.txt
The exists() method
Since Path
objects only represent paths, sometimes you need to check if a path exists using the .exists()
method:
The .exists() method check if a path exists. This is useful because Path
objects can represent files and directories that may or may not actually be present in the file system.
image_file.exists()
False
Creating and deleting paths
pathlib
also offers functionalities for creating and deleting files and directories. Let's see how.
The mkdir() method
The mkdir()
method creates a new directory at the specified path. By default, it creates the directory in the current working directory.
from pathlib import Path
data_dir = Path("new_data_dir")
# Create the directory 'new_data_dir' in the current working directory
data_dir.mkdir()
The mkdir(parents=True) method
The mkdir(parents=True)
method is particularly useful when you want to create a directory structure where some parent directories might not exist. Setting parents=True
ensures that all necessary parent directories are created along the way.
sub_dir = Path("data/nested/subdirectory")
# Create 'data/nested/subdirectory', even if 'data' or 'nested' don't exist
sub_dir.mkdir(parents=True)
Keep in mind that mkdir()
raises an exception if a directory with the same name already exists.
Path('data').mkdir()
FileExistsError: [Errno 17] File exists: 'data'
The unlink() method
The unlink()
method permanently deletes a file represented by the Path
object. It is recommended to check if a file exists before running this method in order to avoid receiving an error.
to_delete = Path("data/prices.csv")
if to_delete.exists():
to_delete.unlink()
print(f"Successfully deleted {to_delete.name}")
Successfully deleted prices.csv
The rmdir() method
The rmdir()
method removes an empty directory. Remember that rmdir()
only works to remove empty directories. The easiest way to delete a non-empty directory is to use shutil library or the terminal.
empty_dir = Path("new_data_dir")
empty_dir.rmdir()
Note: Please be cautious when using unlink()
or rmdir()
because their results are permanent.
Advanced Path Manipulation
Let’s move on to some advanced path manipulation concepts and how to apply them in practice using pathlib
.
Relative vs. absolute paths
We will start by understanding the differences between absolute and relative paths, as they come up often.
Relative paths
Relative paths specify the location of a file or directory relative to the current directory, hence the word relative. They are short and flexible within your project but can be confusing if you change the working directory.
For example, I have an images folder in my current working directory, which has the midjourney.png
file.
image = Path("images/midjourney.png")
image
PosixPath('images/midjourney.png')
The above code works now, but if I move the notebook I am using to a different location, the snippet will break because the images folder didn't move with the notebook.
Absolute paths
Absolute paths specify the full location of a file or a directory from the root of the file system. They are independent of the current directory and offer a clear reference point for any user anywhere on the system.
image_absolute = Path("/home/bexgboost/articles/2024/4_april/8_pathlib/images/midjourney.png")
image_absolute
PosixPath('/home/bexgboost/articles/2024/4_april/8_pathlib/images/midjourney.png')
As you can see, absolute paths can be pretty long, especially in complex projects with nested tree structures. For this reason, most people prefer relative paths which are shorter.
Resolve method
pathlib
provides methods to convert relative paths to absolute with the resolve()
method.
relative_image = Path("images/midjourney.png")
absolute_image = relative_image.resolve()
absolute_image
PosixPath('/home/bexgboost/articles/2024/4_april/8_pathlib/images/midjourney.png')
We can also go the other way: If we have an absolute path, we can convert it to a relative path based on a reference directory.
relative_path = Path.cwd()
absolute_image.relative_to(relative_path)
PosixPath('images/midjourney.png')
Globbing
In order to illustrate globbing, we can turn back to the example we introduced at the beginning of the article, where we wrote code to find all the png
files in a given directory.
files = list(dir_path.glob("*.png"))
pathlib
uses the built-in .glob()
module to efficiently search for files matching a specific pattern in any directory. This module is very useful when processing files with similar names or extensions.
The glob method works by accepting a pattern string containing wildcards as input and it returns a generator object that yields matching Path
objects on demand:
-
*
: Matches zero or more characters. -
?
: Matches any single character. -
[]
: Matches a range of characters enclosed within brackets (e.g., [a-z] matches any lowercase letter).
To illustrate, let’s try to find all Jupyter notebooks in my articles directory.
articles_dir = Path.home() / "articles"
# Find all scripts
notebooks = articles_dir.glob("*.ipynb")
# Print how many found
print(len(list(notebooks)))
0
The .glob()
method didn't find any notebooks, which at first glance seems surprising because I have written over 150 articles. The reason is that .glob()
only searches inside the given directory, not its subdirectories.
We can solve this by doing a recursive search, for which we need to use the rglob()
method, which has similar syntax:
notebooks = articles_dir.rglob("*.ipynb")
print(len(list(notebooks)))
357
This time, our code found all 357 files.
Working with files
As we have seen, Path
objects only represent files but don't perform operations on them. However, they do have certain methods for common file operations. We will see how to use them in this section.
Reading files
Reading file contents is a fundamental operation in many Python applications. pathlib
provides convenient shorthand methods for reading files as either text or raw bytes.
The read_text()
method allows us to read the contents of a text file and close the file.
file = Path("file.txt")
print(file.read_text())
This is sample text.
For binary files, we can use the read_bytes()
method instead.
image = Path("images/midjourney.png")
image.read_bytes()[:10]
b'\x89PNG\r\n\x1a\n\x00\x00'
Remember, when using a read_*
method, error handling is important:
nonexistent_file = Path("gibberish.txt")
try:
contents = nonexistent_file.read_text()
except FileNotFoundError:
print("No such thing.")
No such thing.
Writing files
Writing to files is as easy as reading files. To write files, we have the write_text()
method.
file = Path("file.txt")
file.write_text("This is new text.")
17
file.read_text()
'This is new text.'
As we can see, the write_text()
method overwrites text. Although there is no append mode for write_text()
, we can use read_text()
and write_text()
together to append text to the end of the file.
old_text = file.read_text() + "\n"
final_text = "This is the final text."
# Combine old and new texts and write them back
file.write_text(old_text + final_text)
print(file.read_text())
This is new text.
This is the final text.
write_bytes()
works in a similar way. To illustrate, let's first duplicate the midjourney.png
image with a new name.
original_image = Path("images/midjourney.png")
new_image = original_image.with_stem("duplicated_midjourney")
new_image
PosixPath('images/duplicated_midjourney.png')
The with_stem()
method returns a file path with a different filename (although the suffix stays the same). This lets us read an original image and write its context to a new image.
new_image.write_bytes(original_image.read_bytes())
1979612
File renaming and moving
In addition to the with_stem()
function to rename a file's stem, pathlib
offers the rename()
method to rename more completely.
file = Path("file.txt")
target_path = Path("new_file.txt")
file.rename(target_path)
PosixPath('new_file.txt')
rename()
accepts a target path, which can be a string or another path object.
To move files, you can use the replace()
function, which also accepts a destination path:
# Define the file to be moved
source_file = Path("new_file.txt")
# Define the location to put the file
destination = Path("data/new/location")
# Create the directories if they don't exist
destination.mkdir(parents=True)
# Move the file
source_file.replace(destination / source_file.name)
PosixPath('data/new/location/new_file.txt')
Creating blank files
pathlib
allows us to create blank files using the touch
method:
# Define new file path
new_dataset = Path("data/new.csv")
new_dataset.exists()
False
new_dataset.touch()
new_dataset.exists()
True
The touch
method is originally meant for updating a file's modification time, so it can be used on existing files as well.
original_image.touch()
When you need to reserve a filename for later use but don’t have any content to write to it at the moment, we can use touch to create a blank. The method was inspired by the Unix touch terminal command.
Permissions and file system information
As a final item, we will learn how to access file characteristics using the .stat()
method. If you are familiar with os
, you will notice that this new method has the same output as os.stat()
.
image_stats = original_image.stat()
image_stats
os.stat_result(st_mode=33188, st_ino=1950175, st_dev=2080, st_nlink=1, st_uid=1000, st_gid=1000, st_size=1979612, st_atime=1714664562, st_mtime=1714664562, st_ctime=1714664562)
We can also retrieve the file size using dot notation.
image_size = image_stats.st_size
# File size in megabytes
image_size / (1024**2)
1.8879051208496094
Conclusion
The introduction of the pathlib
module in Python 3.4 has significantly simplified file system manipulation for developers. By providing an object-oriented approach to handling file paths, pathlib
provides a structured and straightforward way to represent file system paths. pathlib
also offers platform independence, meaning pathlib
handles path separators consistently across different operating systems so our code doesn't break on a new machine. Finally, pathlib
offers a vast set of concise and expressive methods for common file system operations, as we have seen.
Keep in mind, pathlib
represents one of many powerful built-in libraries in Python. By taking our Data Scientist With Python Career Track, Python Programming Skill Track, and Intro to Python for Data Science courses, you will master a wide set of built-in libraries to become a strong Python programmer.
Thank you for reading!
I am a data science content creator with over 2 years of experience and one of the largest followings on Medium. I like to write detailed articles on AI and ML with a bit of a sarcastıc style because you've got to do something to make them a bit less dull. I have produced over 130 articles and a DataCamp course to boot, with another one in the makıng. My content has been seen by over 5 million pairs of eyes, 20k of whom became followers on both Medium and LinkedIn.
Learn Python with DataCamp
Course
Writing Efficient Python Code
Course
Writing Functions in Python
tutorial
How to Get the Current Directory in Python
Stephen Gruppetta
tutorial
Python Tutorial for Beginners
tutorial
Python Setup: The Definitive Guide
J. Andrés Pizarro
15 min
tutorial
Working with Modules in Python
Nishant Kumar
8 min
tutorial
How to Delete a File in Python
tutorial