Kurs
Git's protective approach to data deletion prevents accidental loss of important commits or data.
However, this can result in outdated data, such as references to deleted branches, remaining visible. Over time, Git repositories can accumulate unreferenced objects, which consume unnecessary disk space and potentially cause confusion.
The git prune command is a housekeeping utility within Git, designed primarily to clean up unreachable objects in the repository.
An unreachable object is an object that isn't accessible by any branch, tag, remote-tracking branch, or other reference. These objects can consume space in the repository, cluttering it with unnecessary data over time.
While git prune is a powerful tool for keeping repositories lean, most users may not need to use it directly due to Git's automatic garbage collection (git gc). However, understanding its role and function can be helpful for advanced Git users or in specific scenarios where manual repository maintenance is required or preferred.
Become a Data Engineer
What Is git prune?
The git prune command is used to remove objects that are no longer needed in the local repository. These objects could be commits, trees (directory snapshots), blobs (files), and tags that are no longer accessible from any branch or tag in the repository.
Simply put, git prune helps clean up unnecessary files and data in the repository, saving space and reducing clutter.
How can objects become unreachable?
Objects can become unreachable in several ways—for instance, by deleting branches or rewriting commits. When branches are deleted, any commits unique to those branches that aren't part of any other branch or tag, become unreachable.
Commit rewriting, using commands like git rebase, generates new commits and discards the old ones, leading them to become unreachable as well.
Grace period and reflogs
Git maintains a log of updates to branch tips and other references called reflogs (reference logs). We can view it using the git reflog command.
Even if an object is unreachable, if it is still in the reflog, then it won’t be deleted by git prune.
By default, the reflog has an expiration date of 90 days, providing a grace period during which unreachable objects are temporarily stored and not immediately pruned.
When to Use git prune
Generally, we don’t need to use git prune directly. Git has a garbage collection mechanism that runs automatically after some commands to clean up unnecessary files and optimize the local repository’s efficiency by compressing some files.
Nevertheless, we may want to clean our repository manually, for example:
- After we perform an operation that we know will create unreachable objects, such as a branch deletion.
- We want to clean disk space immediately.
- Keeping the repository tidy and clean at all times.
Garbage collection with git gc
Instead of direct pruning, it is usually recommended to rely on the garbage collection mechanism, which not only performs a git prune but also optimizes space by compressing objects.
As mentioned above, garbage collection will be executed automatically after some commands. We can invoke the garbage collection manually with the command:
git gc
How to Use git prune
Because git prune will delete data, it’s recommended to execute it using the --dry-run option first.
git prune --dry-run
This option will list the objects that would be pruned without actually pruning them. The output would look something like:
0d7dff8258654c03a058987b3e63c86feca9200d commit
ea1380f52f0bfa0142e46767adfd56593681091a blob
fa91af78a1ab453c1d7632192b3ca8bf217ec711 commit
The output indicates that there are two commits and blob that are unreachable and would be deleted. After making sure that no important data is listed, we can proceed with the cleanup:
git prune
In some situations, we want to clean the repository just after we perform an action that we know will lead to unreachable objects, such as a branch deletion. We may run git prune --dry-run, but the output comes out empty. The reason for this is that the deleted commits are still referenced in the reflog.
If we don’t want to wait for them to expire from the reflog, we can manually expire them using the command:
git reflog expire --expire-unreachable=now --all
Let’s break down the options we used:
- The
--expire-unreachable=nowoption sets the expiration date of all unreachable objects to now, effectively expiring them immediately. - The
--alloption targets all the reflog entries in the repository. Without this option, we would have to specify a particular ref (like a branch or tag) on which we want to operate.
git prune: Advanced Usage
Let’s take a look at some advanced techniques, like specifying an expiration time or pruning packed files.
Specifying an expiration time
We might want only to clean older unreachable objects, for instance, pruning only objects that are at least two weeks old.
The --expire <time> option allows us to specify a cutoff time. Git will remove objects that have been unreachable for more than a specified time. The <time> parameter accepts different formats like "2 weeks ago", "3 days," "yesterday," etc., to provide flexibility in specifying the expiration period.
Example command:
git prune --expire=2.weeks.ago
Pruning packed files
Git stores objects in two main ways: as individual files (loose objects) and in packed files. Packing is a mechanism to save space and improve performance by storing multiple objects in a single file and removing redundancy.
Occasionally, an object can exist in both loose and packed form. While this redundancy isn't harmful (beyond taking up a little extra disk space), we might want to clean up these loose objects that are also packed to reclaim space.
The git prune-packed command removes loose objects if they are already included in a pack. Unlike git prune, git prune-packed does not take any options like --expire <time>. It simply cleans up loose objects that have already been packed.
Conclusion
As we use a repository, some objects will become unreachable, causing unnecessary disk usage. Git manages this using a garbage collection system that deletes unreachable objects after some time of inactivity.
The garbage collection mechanism is executed automatically by Git after some commands. But we can also run it manually using the git gc command. The garbage collection process will not only remove unreachable objects using git prune but also optimize disk space by compressing some objects.
Advanced Git users may want to manually maintain a clean repository. The git prune command allows manual deletion of unreachable objects. We should always execute git prune --dry-run before executing git prune to get an overview of the objects to be deleted during the pruning process.

