Skip to main content
HomeTutorialsGit

Git Prune: What Is Git Pruning and How to Use Git Prune

Git prune is a Git command that removes objects from the repository that are no longer reachable from any commit or branch, helping to free up disk space.
Aug 2024  · 5 min read

Git's protective approach to data deletion prevents accidental loss of important commits or data.

However, this can result in outdated data, such as references to deleted branches, remaining visible. Over time, Git repositories can accumulate unreferenced objects, which consume unnecessary disk space and potentially cause confusion.

The git prune command is a housekeeping utility within Git, designed primarily to clean up unreachable objects in the repository.

An unreachable object is an object that isn't accessible by any branch, tag, remote-tracking branch, or other reference. These objects can consume space in the repository, cluttering it with unnecessary data over time.

While git prune is a powerful tool for keeping repositories lean, most users may not need to use it directly due to Git's automatic garbage collection (git gc). However, understanding its role and function can be helpful for advanced Git users or in specific scenarios where manual repository maintenance is required or preferred.

Become a Data Engineer

Become a data engineer through advanced Python learning
Start Learning for Free

What Is git prune?

The git prune command is used to remove objects that are no longer needed in the local repository. These objects could be commits, trees (directory snapshots), blobs (files), and tags that are no longer accessible from any branch or tag in the repository.

Simply put, git prune helps clean up unnecessary files and data in the repository, saving space and reducing clutter.

How can objects become unreachable?

Objects can become unreachable in several ways—for instance, by deleting branches or rewriting commits. When branches are deleted, any commits unique to those branches that aren't part of any other branch or tag, become unreachable.

Commit rewriting, using commands like git rebase, generates new commits and discards the old ones, leading them to become unreachable as well.

Grace period and reflogs

Git maintains a log of updates to branch tips and other references called reflogs (reference logs). We can view it using the git reflog command.

Even if an object is unreachable, if it is still in the reflog, then it won’t be deleted by git prune.

By default, the reflog has an expiration date of 90 days, providing a grace period during which unreachable objects are temporarily stored and not immediately pruned.

When to Use git prune

Generally, we don’t need to use git prune directly. Git has a garbage collection mechanism that runs automatically after some commands to clean up unnecessary files and optimize the local repository’s efficiency by compressing some files.

Nevertheless, we may want to clean our repository manually, for example:

  • After we perform an operation that we know will create unreachable objects, such as a branch deletion. 
  • We want to clean disk space immediately.
  • Keeping the repository tidy and clean at all times.

Garbage collection with git gc

Instead of direct pruning, it is usually recommended to rely on the garbage collection mechanism, which not only performs a git prune but also optimizes space by compressing objects.

As mentioned above, garbage collection will be executed automatically after some commands. We can invoke the garbage collection manually with the command:

git gc

How to Use git prune

Because git prune will delete data, it’s recommended to execute it using the --dry-run option first.

git prune --dry-run

This option will list the objects that would be pruned without actually pruning them. The output would look something like:

0d7dff8258654c03a058987b3e63c86feca9200d commit
ea1380f52f0bfa0142e46767adfd56593681091a blob
fa91af78a1ab453c1d7632192b3ca8bf217ec711 commit

The output indicates that there are two commits and blob that are unreachable and would be deleted. After making sure that no important data is listed, we can proceed with the cleanup:

git prune

In some situations, we want to clean the repository just after we perform an action that we know will lead to unreachable objects, such as a branch deletion. We may run git prune --dry-run, but the output comes out empty. The reason for this is that the deleted commits are still referenced in the reflog.

If we don’t want to wait for them to expire from the reflog, we can manually expire them using the command:

git reflog expire --expire-unreachable=now --all

Let’s break down the options we used:

  • The --expire-unreachable=now option sets the expiration date of all unreachable objects to now, effectively expiring them immediately.
  • The --all option targets all the reflog entries in the repository. Without this option, we would have to specify a particular ref (like a branch or tag) on which we want to operate.

git prune: Advanced Usage

Let’s take a look at some advanced techniques, like specifying an expiration time or pruning packed files.

Specifying an expiration time

We might want only to clean older unreachable objects, for instance, pruning only objects that are at least two weeks old.

The --expire <time> option allows us to specify a cutoff time. Git will remove objects that have been unreachable for more than a specified time. The <time> parameter accepts different formats like "2 weeks ago", "3 days," "yesterday," etc., to provide flexibility in specifying the expiration period.

Example command:

git prune --expire=2.weeks.ago

Pruning packed files

Git stores objects in two main ways: as individual files (loose objects) and in packed files. Packing is a mechanism to save space and improve performance by storing multiple objects in a single file and removing redundancy.

Occasionally, an object can exist in both loose and packed form. While this redundancy isn't harmful (beyond taking up a little extra disk space), we might want to clean up these loose objects that are also packed to reclaim space.

The git prune-packed command removes loose objects if they are already included in a pack. Unlike git prune, git prune-packed does not take any options like --expire <time>. It simply cleans up loose objects that have already been packed.

Conclusion

As we use a repository, some objects will become unreachable, causing unnecessary disk usage. Git manages this using a garbage collection system that deletes unreachable objects after some time of inactivity.

The garbage collection mechanism is executed automatically by Git after some commands. But we can also run it manually using the git gc command. The garbage collection process will not only remove unreachable objects using git prune but also optimize disk space by compressing some objects.

Advanced Git users may want to manually maintain a clean repository. The git prune command allows manual deletion of unreachable objects. We should always execute git prune --dry-run before executing git prune to get an overview of the objects to be deleted during the pruning process.

Become a Data Engineer

Prove your skills as a job-ready data engineer.

Photo of François Aubry
Author
François Aubry
LinkedIn
Teaching has always been my passion. From my early days as a student, I eagerly sought out opportunities to tutor and assist other students. This passion led me to pursue a PhD, where I also served as a teaching assistant to support my academic endeavors. During those years, I found immense fulfillment in the traditional classroom setting, fostering connections and facilitating learning. However, with the advent of online learning platforms, I recognized the transformative potential of digital education. In fact, I was actively involved in the development of one such platform at our university. I am deeply committed to integrating traditional teaching principles with innovative digital methodologies. My passion is to create courses that are not only engaging and informative but also accessible to learners in this digital age.
Topics

Learn version control with these courses!

Course

Introduction to Git

4 hr
37.2K
Familiarize yourself with Git for version control. Explore how to track, compare, modify, and revert files, as well as collaborate with colleagues using Git.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

Understanding GitHub: What is GitHub and How to Use It

Discover the uses of GitHub, a tool for version control and collaboration in data science. Learn to manage repositories, branches, and collaborate effectively.
Samuel Shaibu's photo

Samuel Shaibu

9 min

cheat-sheet

Complete Git Cheat Sheet

Git lets you manage changes made to files and directories in a project. It allows you to keep track of what you did, undo any changes you decide you don't want, and collaborate at scale with others.
Richie Cotton's photo

Richie Cotton

9 min

tutorial

How to Clone a Specific Branch In Git

Learn how to clone only a single branch from a Git repository to save disk space and reduce cloning time.
Bex Tuychiev's photo

Bex Tuychiev

6 min

tutorial

GIT Push and Pull Tutorial

Learn how to perform Git PUSH and PULL requests through GitHub Desktop and the Command-Line.

Olivia Smith

13 min

tutorial

Git Reset and Revert Tutorial for Beginners

A beginner’s guide tutorial demonstrating how to use the Git Revert and Reset commands.
Zoumana Keita 's photo

Zoumana Keita

10 min

tutorial

Git Rename Branch: How to Rename Local or Remote Branch

Learn how to rename local and remote Git branches using either the terminal or the graphical user interface (GUI) of popular clients like GitHub.
François Aubry's photo

François Aubry

5 min

See MoreSee More