Course
Understanding GitHub: What is GitHub and How to Use It
GitHub logo. Source: GitHub Logos and Usage
Imagine you are working on a data science project, and you’ve made significant progress. Suddenly, a bug appears. You wish you could go back to the last working version, but you can’t remember all the changes you have made. Or maybe you are collaborating with others, and it’s a nightmare to combine everyone’s input. If these scenarios sound familiar, you’re not alone.
These common issues can be solved with GitHub, the top version control and collaboration platform. In this article, we will look at how GitHub can improve the way you manage your data projects. Additionally, we will explore collaboration techniques and strategies for increasing productivity.
Let’s get started by understanding the fundamentals of version control.
What is Version Control?
Version control is a system that tracks changes to files over time. It allows many people to collaborate on a project. It also maintains a history of modifications. Without version control, managing code changes can become chaotic and error-prone, especially in team-based projects where different contributors might be working on various parts of the code simultaneously.
What is GitHub Used For?
As you might have guessed, GitHub is a versatile platform that excels in version control. But GitHub has numerous applications beyond version control, also, including:
-
Creating Project Portfolios: GitHub allows you to create a public GitHub profile to show your data skills and projects to employers or colleagues.
-
Collaboration: GitHub facilitates working together on projects with teammates. This involves sharing code snippets and reviewing each other’s work.
-
Open-Source Contribution: With GitHub, you can explore and contribute to existing open-source data science projects. This accelerates learning and innovation.
How Does GitHub Work?
To fully understand the benefits of GitHub, it’s essential to understand its key components and how they function together.
- Repositories: Repositories are folders that store your project files and their version history. You can think of them as digital filing cabinets for your data projects. Each repository has a unique URL and contains files, branches, and commits.
- Forks: A fork is a personal copy of another user’s repository. You can make changes independently. Later, you can propose them back to the original repository.
- Pull Requests: PRs are a formal way to propose your changes to the original project owner for review and merging. They ease code review and collaboration.
- Issues: These are used to track tasks, bugs, or enhancements.
- Branches: A branch is a parallel version of a repository. You can create branches to work on specific features or fixes. Merge them back into the main branch when ready. Learn more about this in this tutorial on Git Clone Branch.
- Merging: Merging combines the best aspects of your changes with the original project. This keeps everything organized and up-to-date. A good example is merging a feature branch into the main branch.
Git vs. GitHub
You might be wondering how Git relates to GitHub. These two terms are often used interchangeably, but there’s a key distinction.
Git is a distributed version control system (DVCS) that helps developers manage their code. It keeps track of changes and allows for the creation of different versions, or branches, of the code. This makes it easy for developers to work together. Git also supports features like staging areas and commit histories, which provides a detailed record of code modifications.
GitHub, on the other hand, provides additional features like access control, bug tracking, task management, and wikis, making it easier for developers to collaborate on projects. By using GitHub, you can manage your code, track changes, review contributions, and discuss issues - all in one place. It also integrates with various tools and services, which enhances the development workflow.
Category |
Git |
GitHub |
Definition |
Distributed version control system |
Web-based platform built on top of Git |
Purpose |
Helps you manage code, track changes, and create branches |
Hosts Git repositories and provides additional collaboration tools |
Features |
Staging areas, commit histories, branching, and merging |
Access control, bug tracking, task management, wikis, and integrations |
Benefit |
Enables collaborative work and detailed tracking of code changes |
Enhances collaboration, project management, and code review processes |
How to Use GitHub
So far, we have looked at what GitHub is and what version control is. We have also explored the comparison between Git and GitHub. Now, let’s get hands-on with GitHub.
First, we will learn how to sign up for a GitHub account, personalize our experience, and choose a plan. Next, we'll walk through creating a repository, including setting it up, adding descriptions, and managing visibility. After that, we'll cover creating branches to work on different versions of our project and then move on to making commits, where we’ll learn to edit files and document changes.
Signing up
Here are the steps to sign up for a GitHub account and get started:
- Visit GitHub and click on the Sign up button.
- Follow the prompts to create an account. You will need to provide your email address, create a username, and set a password.
- You can personalize your experience by choosing a plan that suits you. You can also customize your preferences during setup. The free plan is enough for beginners and junior data practitioners.
Signing Up for GitHub. Image by Author
Creating a repository
After signing up, the next step is to create a repository. Here are the steps to create your first repository:
-
Click on the + icon in the top right corner and select New repository.
-
You can add a name and a description, and you can choose if you want the repository to be public or private. Public repositories are visible to everyone. Private repositories are only accessible to you and the collaborators you invite.
-
Optionally, you can add a README file, a
.gitgnore
file, and a license. These can also be added later. -
Click Create repository.
Creating a repository. Image by Author
After following the step-by-step approach listed above, you will see a window showing a quick setup, usually for a new repository. You can either start this setup by creating a new file or uploading an existing file to the new repository.
New repository setup. Image by Author
Uploading files. Image by Author
Creating branches
Once your repository is set up, the next step is to create branches. Here are the steps to create branches in your repository:
-
In your repository, click on Branch:main. You can find this near the top of the page.
-
You can then click the New branch button at the top right corner.
-
Enter a new branch name and click Create new branch.
-
You can switch between branches by clicking the branch dropdown and selecting the branch you want to work on.
Creating branches. Image by Author
Making commits
After creating branches, the next step is to make commits. Here are the steps to commit changes:
-
Navigate to the file you want to edit in your repository.
-
You can make changes by clicking on the pencil icon to edit the file and making your changes in the text editor.
-
Click Commit Changes. You can simply scroll down to the Commit changes section. It is important to add a commit message, as it describes the changes you made.
Making commits. Image by Author
Creating a pull request
Once you have made commits, the next step is to create a pull request. Here are the steps to create a pull request:
-
Go to the Pull requests tab in your repository.
-
Click New pull request. GitHub will automatically compare changes between branches.
-
Review the changes to ensure everything is correct by comparing the changes between your branch and the main branch.
-
Create a pull request by clicking on Create pull request. You can then add a title and description for your pull request.
-
Add reviewers if needed, and submit. This is optional but can be very important if you’re concerned about your changes and getting feedback from leadership.
Creating a pull request. Image by Author
Merging branches
After your pull request has been reviewed and approved, the final step is to merge branches. Here are the steps to merge branches:
- After the pull request has been reviewed and approved, navigate to the pull request in the Pull requests tab.
- Click the Merge pull request button.
- Confirm the merge by clicking Confirm merge.
Besides using the GitHub GUI, you can use Git to do most tasks. For example, you can create a Git repository locally and merge changes from a remote repository with the git pull
command. Becoming comfortable in this kind of environment is important as you move up in your career, and DataCamp's Introduction to Git course is a great resource for your journey. You can also try our GitHub and Git Tutorial for Beginners to get started.
GitHub Alternatives
GitHub is the most popular platform for version control and collaboration. However, there are several alternatives that have unique offers and benefits. It is very important for aspiring data practitioners to be aware of these platforms that may cater to their specific needs.
Let's take a look at some of the GitHub alternatives.
GitLab
This is a DevOps platform that offers Git repository management, issue tracking, and CI/CD pipeline features. GitLab is popular for its all-in-one approach to DevOps by integrating a wide range of tools and services.
GitLab logo
Let’s look at some key features:
- Integrated CI/CD: GitLab has built-in pipelines for continuous integration and deployment. These pipelines automate code testing and deployment.
- Issue Tracking: With GitLab’s issue tracking system, you can manage tasks, bugs, and feature requests. This system uses boards, milestones, and labels.
- Auto DevOps: This feature configures pipelines based on the project structure, which saves time during setup
- Container Registry: GitLab provides a built-in container registry for managing Docker images.
- Security Features: Integrated security testing tools allow for continuous monitoring. These tools also scan for vulnerabilities in your codebase.
Bitbucket
Bitbucket is an Atlassian suite product that manages Git repositories, including Mercurial repositories. It is designed to work easily with other Atlassian products, such as Jira and Trello.
Bitbucket Logo
Here are some key features:
- Branch Permissions: Bitbucket allows fine-grained control over who can access and change branches. This enhances security.
- CI/CD Integration: Integrates with Bitbucket Pipelines for continuous integration and delivery. This enables automated testing and deployment.
- Jira Integration: Direct integration with Jira allows for comprehensive project management. It also enables tracking of development progress.
- Pull Requests: In-line commenting and pull request reviews facilitate effective code review processes.
- Deployment: Supports deployments through Bitbucket Pipelines, allowing for streamlined deployment processes.
SourceForge
SourceForge is one of the original platforms for hosting and managing open-source projects. It offers a wide range of tools for developers. These include code repositories, bug tracking, and project management features.
Key features include:
- Project Hosting: It provides free hosting for open-source projects. This includes unlimited bandwidth and storage.
- Download Stats: Detailed statistics on project downloads help track the popularity of projects. They also help track the progress of projects.
- Discussion Forums: Built-in forums facilitate community engagement and support.
- File Release System: An advanced file release system manages project downloads and versions.
- SVN and Git Support: Supports both Subversion (SVN) and Git version control systems.
AWS CodeCommit
AWS CodeCommit is a fully managed source control service hosted by Amazon Web Services . It allows you to securely store and manage your code in the cloud, integrating seamlessly with other AWS services.
Key features include:
- Scalability: AWS CodeCommit scales to meet your repository needs without requiring infrastructure management.
- Security: Integration with AWS Identity and Access Management (IAM) provides fine-grained access control.
- Integration: Works well with other AWS services like CodePipeline, CodeBuild, and CodeDeploy. This facilitates a complete CI/CD workflow.
- High Availability: Built on AWS infrastructure. This ensures high availability and durability of your repositories.
- Encryption: Data is encrypted both in transit and at rest, enhancing security.
Conclusion
In this article, we have explored the fundamental aspects of GitHub, its importance, and detailed its operation. We started by understanding the concept of version control and its significance in managing changes to code and data. Finally, we provided a practical guide to using GitHub. This involved signing up, creating repositories, committing, managing pull requests, and merging branches.
A gateway to success in the data space is continuous learning, and to further broaden your horizons on GitHub and its applications in data science, you can listen to our DataFramed podcast episode on The Future of Programming with Kyle Daigle, which provides insights from the COO of GitHub on the evolving landscape of programming, and you can also explore GitHub Concepts, a comprehensive course on all things GitHub. You can also read our guide on GitHub certifications to learn more about how it can benefit your career.
Experienced data professional and writer who is passionate about empowering aspiring experts in the data space.
Frequently Asked Questions
Is GitHub hard to learn?
While Git, the underlying technology, has a learning curve, GitHub provides a user-friendly interface to manage Git functionalities.
How is Git different from GitHub?
Git is a version control system that tracks changes to files, while GitHub is a platform that hosts Git repositories online.
Is GitHub safe for storing sensitive data?
Public repositories on GitHub are visible to everyone. It’s best practice to avoid storing sensitive data, like passwords or API keys, in public repositories. Use private repositories instead.
Do I need to pay to use GitHub?
GitHub offers a free plan with unlimited public repositories and private repositories (limited storage). Paid plans offer additional features, like increased storage and advanced collaboration.
What are some alternatives to GitHub?
GitLab, Bitbucket, SourceForge, AWS CodeCommit
How can I demonstrate my knowledge of GitHub during interviews?
To effectively showcase your GitHub expertise during interviews, consider obtaining a GitHub Certification which validates your skills, enhances your resume, boosts your confidence, and shows commitment to learning.
Learn with DataCamp
Course
Introduction to GitHub Concepts

blog
What is Git? - The Complete Guide to Git

Summer Worsley
14 min
blog
Version Control For Data Science
podcast
Machine Learning & Data Science at Github

cheat-sheet
Complete Git Cheat Sheet
Tutorial
GitHub and Git Tutorial for Beginners

Tutorial
How to Use GitHub Copilot: Use Cases and Best Practices

Eugenia Anello
8 min