Skip to main content
HomeTutorialsData Science

How to Use Git Rebase: A Tutorial for Beginners

Discover what Git Rebase is and how to use it in your data science workflows.
May 2023  · 8 min read

As data science, data engineering, and machine learning become ubiquitous in nearly every aspect of the daily operations of a business, data professionals can no longer work in a vacuum. Instead, data practitioners must work closely with software developers to make data science projects fully operational and scalable. And that means data professionals will need to integrate new software-development-oriented tools in their skillset, including Git.

Git is an open-source version control tool created for version control operations. It allows developers and data professionals to efficiently track changes in project files, such as Python scripts and datasets, so that project members can have a record of all the changes, synchronize their tasks, and work collaboratively without losing information. A great place to get started with git is our Introduction to Git Course.

One of the key features of Git is the possibility to use branches. Branches enable developers to have multiple versions of a project and track each version systematically. At some point, you may need to integrate the progress made in a branch into another branch. There are various strategies to combine branches, including the git merge and the git rebase commands.

The aim of this article is to introduce git rebase, a command that helps integrate branches by moving a sequence of commits from a source branch on top of a target branch, normally the main branch.

Understanding Git Rebase

Imagine you are working on a data science project together with other data professionals, and you are in charge of developing a dashboard with PowerBI based on data collected in previous commits.

Since you will be the main person responsible for the dashboard, it makes sense to create a dedicated branch. In the meantime, the rest of the team can continue updating the main branch with new commits. The situation looks like this:

Creating a branch for a new feature

Figure 1. Source

Now imagine that the latest commits in the main branch are relevant to your branch. For example, they may have added new datasets that can enrich your dashboard. In this case, you need to incorporate the new commits into your branch. There are two ways of integrating these changes; either merging or rebasing.

If you combine the branches using the git merge command, you will create a new merge commit in your branch that blends together the latest changes made in the main branch and the branch you created.

Merging branches.

Figure 2. Source

git merge is a non-destructive operation, meaning that the histories of both branches don’t change. This is good for traceability, but it also adds complexity to your branch, as you may have to continuously add new, extraneous commits from the main branch, especially if the main branch is active. It can also result in potential merge conflicts, as explained in this blog on resolving merge conflicts.

On the other hand, if you use git rebase, you will move your whole branch on top of the main branch. This operation will rewrite the project history by inserting new commits for each commit you created in your branch. The resulting branch look as follows:

Rebasing branches

Figure 3. Source

git rebase is a great option to keep things clear in the project. By moving the whole feature branch to the tip of the main, the history of the project will change, resulting in a perfectly linear history that is easier to navigate through. git rebase also eliminates the need for the extra merge commits required with git merge.

However, git rebase also comes with potential conflicts and pitfalls. Since it changes the commit history and no merge commits are required, it can be difficult to identify when the changes made by git rebase were incorporated, thereby compromising traceability. Even more important, if git results, accidentally or unconsciously, in changes in the main branch, and then you push the changes to the remote repository, this will lead to unexpected changes that may conflict with the work made by other developers. Hence, you should use git rebase carefully and only under certain circumstances.

Let’s now see how git rebase works in practice!

Performing Git Rebase

To illustrate how git rebase works, we will replicate the example described in the previous section. If you work on a collaborative project, the first thing you will have to do is fetch and download the content from the remote repository where the project is hosted (normally, in the cloud or a remote server) and update your local repository with the latest updates, so you’re on the same page as the rest of the team. This can be done with the git pull command, which you can prompt on your terminal:

git pull

Now you’re ready to create a new branch where you will work on the PowerBI dashboard. This can be done with the following command:

git checkout -b powerbi_dashboard

Once the new branch is created, you start working and register your progress with several commits. For example, you make a new commit after putting together the plots that will show up in the dashboard.

git add .  
git commit -m "powerbi_dashboard/add_plots"

In the meantime, your colleagues make new progress in the main branch relevant to your dashboard project. Again, the first thing you have to do is prompt git pull to get the updates from the remote repository and copy them into your local repository. This will result in the situation described in Figure 1.

From here, you could either choose git merge or git rebase. If you choose the latter, you will just need to prompt git rebase to move your branch on top of the main branch.

git checkout -b powerbi_dashboard
git rebase main 

Interactive Rebasing

If no additional information is provided, git rebase will translate all the commits in your branch to the head of the main branch. However, you can use the interactive mode if you want more control over how the commits are moved. This will allow you to clean up your commit history before moving the branch to the main branch, thereby increasing clarity.

To initiate interactive rebasing, add the i option to the git rebase command:

git checkout -b powerbi_dashboard
git rebase -i main

This will open an editor where you can change the behavior of git for each commit you want to rebase. Some of the available options are changing commit messages, melding commits, and removing commits. The complete list can be found in the git documentation. Below you can find what the editor looks like:

pick 46s9451 powerbi_dashboard/add_plots #keep commit
reword 94561f6 powerbi_dashboard/set_data_connectors #edit commit message
drop 1f094y8 powerbi_dashboard/minor_changes #remove commit

Git Rebase Best Practices

git rebase is a useful command, but it can lead to confusing situations if used in the wrong scenarios. As a rule of thumb, you should use git rebase only in local repositories.

Particularly, git rebase comes in handy when working on a dedicated branch to develop a particular feature. In this context, git rebase can be used to keep your commit story clean and linear, thereby making life easier for other developers to follow your progress at the time of pushing your commits to the remote repository.

However, you may encounter merge conflicts during your rebase workflow, especially if you haven’t incorporated the commits by your colleagues in the main branch for a while.

This situation where your branch conflicts with the latest commits in the main branch can be remedied by rebasing your branch frequently against the main. That way, you can be sure that you’re working on the same page.

Finally, prevention is better than cure. So, before considering git rebase, check it out with the rest of the team so they can provide you with guidelines on how and where to use it.

Conclusion

You made it to the end of the tutorial. Congratulations! git rebase is a command worth including on your git workflows. But there’s much more to learn about Git, a must-have tool for both developers and data practitioners.

Below you can find some DataCamp resources to help you master Git. We hope you enjoy them!


Photo of Javier Canales Luna
Author
Javier Canales Luna
Topics
Related

Data Science in Finance: Unlocking New Potentials in Financial Markets

Discover the role of data science in finance, shaping tomorrow's financial strategies. Gain insights into advanced analytics and investment trends.
 Shawn Plummer's photo

Shawn Plummer

9 min

5 Common Data Science Challenges and Effective Solutions

Emerging technologies are changing the data science world, bringing new data science challenges to businesses. Here are 5 data science challenges and solutions.
DataCamp Team's photo

DataCamp Team

8 min

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

Avoiding Burnout for Data Professionals with Jen Fisher, Human Sustainability Leader at Deloitte

Jen and Adel cover Jen’s own personal experience with burnout, the role of a Chief Wellbeing Officer, the impact of work on our overall well-being, the patterns that lead to burnout, the future of human sustainability in the workplace and much more.
Adel Nehme's photo

Adel Nehme

44 min

Becoming Remarkable with Guy Kawasaki, Author and Chief Evangelist at Canva

Richie and Guy explore the concept of being remarkable, growth, grit and grace, the importance of experiential learning, imposter syndrome, finding your passion, how to network and find remarkable people, measuring success through benevolent impact and much more. 
Richie Cotton's photo

Richie Cotton

55 min

Introduction to DynamoDB: Mastering NoSQL Database with Node.js | A Beginner's Tutorial

Learn to master DynamoDB with Node.js in this beginner's guide. Explore table creation, CRUD operations, and scalability in AWS's NoSQL database.
Gary Alway's photo

Gary Alway

11 min

See MoreSee More