Skip to main content
HomeBlogGit

Understanding GitHub: What is GitHub and How to Use It

Discover the uses of GitHub, a tool for version control and collaboration in data science. Learn to manage repositories, branches, and collaborate effectively.
Jun 14, 2024  · 9 min read

What is GitHub?

GitHub logo. Source: GitHub Logos and Usage 

Imagine you are working on a data science project, and you’ve made significant progress. Suddenly, a bug appears. You wish you could go back to the last working version, but you can’t remember all the changes you have made. Or maybe you are collaborating with others, and it’s a nightmare to combine everyone’s input. If these scenarios sound familiar, you’re not alone.

These common issues can be solved with GitHub, the top version control and collaboration platform. In this article, we will look at how GitHub can improve the way you manage your data projects. Additionally, we will explore collaboration techniques and strategies for increasing productivity.

Let’s get started by understanding the fundamentals of version control.

What is Version Control?

Version control is a system that tracks changes to files over time. It allows many people to collaborate on a project. It also maintains a history of modifications. Without version control, managing code changes can become chaotic and error-prone, especially in team-based projects where different contributors might be working on various parts of the code simultaneously.

What is GitHub Used For?

As you might have guessed, GitHub is a versatile platform that excels in version control. But GitHub has numerous applications beyond version control, also, including:

  • Creating Project Portfolios: GitHub allows you to create a public GitHub profile to show your data skills and projects to employers or colleagues.

  • Collaboration: GitHub facilitates working together on projects with teammates. This involves sharing code snippets and reviewing each other’s work.

  • Open-Source Contribution: With GitHub, you can explore and contribute to existing open-source data science projects. This accelerates learning and innovation.

How Does GitHub Work?

To fully understand the benefits of GitHub, it’s essential to understand its key components and how they function together. 

  • Repositories: Repositories are folders that store your project files and their version history. You can think of them as digital filing cabinets for your data projects. Each repository has a unique URL and contains files, branches, and commits.
  • Forks: A fork is a personal copy of another user’s repository. You can make changes independently. Later, you can propose them back to the original repository.
  • Pull Requests: PRs are a formal way to propose your changes to the original project owner for review and merging. They ease code review and collaboration.
  • Issues: These are used to track tasks, bugs, or enhancements. 
  • Branches: A branch is a parallel version of a repository. You can create branches to work on specific features or fixes. Merge them back into the main branch when ready. Learn more about this in this tutorial on Git Clone Branch.
  • Merging: Merging combines the best aspects of your changes with the original project. This keeps everything organized and up-to-date. A good example is merging a feature branch into the main branch.

Git vs. GitHub

You might be wondering how Git relates to GitHub. These two terms are often used interchangeably, but there’s a key distinction.

Git is a distributed version control system (DVCS) that helps developers manage their code. It keeps track of changes and allows for the creation of different versions, or branches, of the code. This makes it easy for developers to work together. Git also supports features like staging areas and commit histories, which provides a detailed record of code modifications.

GitHub, on the other hand, provides additional features like access control, bug tracking, task management, and wikis, making it easier for developers to collaborate on projects. By using GitHub, you can manage your code, track changes, review contributions, and discuss issues - all in one place. It also integrates with various tools and services, which enhances the development workflow.

Category

Git

GitHub

Definition

Distributed version control system

Web-based platform built on top of Git

Purpose

Helps you manage code, track changes, and create branches 

Hosts Git repositories and provides additional collaboration tools

Features

Staging areas, commit histories, branching, and merging

Access control, bug tracking, task management, wikis, and integrations

Benefit

Enables collaborative work and detailed tracking of code changes

Enhances collaboration, project management, and code review processes

How to Use GitHub

So far, we have looked at what GitHub is and what version control is. We have also explored the comparison between Git and GitHub. Now, let’s get hands-on with GitHub.

First, we will learn how to sign up for a GitHub account, personalize our experience, and choose a plan. Next, we'll walk through creating a repository, including setting it up, adding descriptions, and managing visibility. After that, we'll cover creating branches to work on different versions of our project and then move on to making commits, where we’ll learn to edit files and document changes.

Signing up

Here are the steps to sign up for a GitHub account and get started:

  1. Visit GitHub and click on the Sign up button.
  2. Follow the prompts to create an account. You will need to provide your email address, create a username, and set a password.
  3. You can personalize your experience by choosing a plan that suits you. You can also customize your preferences during setup. The free plan is enough for beginners and junior data practitioners.

GitHub Sign Up    Signing Up for GitHub. Image by Author

Creating a repository

After signing up, the next step is to create a repository. Here are the steps to create your first repository:

  1. Click on the + icon in the top right corner and select New repository.

  2. You can add a name and a description, and you can choose if you want the repository to be public or private. Public repositories are visible to everyone. Private repositories are only accessible to you and the collaborators you invite.

  3. Optionally, you can add a README file, a .gitgnore file, and a license. These can also be added later.

  4. Click Create repository.

Creating a repository in GitHub

Creating a repository. Image by Author

After following the step-by-step approach listed above, you will see a window showing a quick setup, usually for a new repository. You can either start this setup by creating a new file or uploading an existing file to the new repository.

GitHub New repository setup

New repository setup. Image by Author

Uploading files in the new GitHub repository

Uploading files. Image by Author

Creating branches

Once your repository is set up, the next step is to create branches. Here are the steps to create branches in your repository:

  1. In your repository, click on Branch:main. You can find this near the top of the page.

  2. You can then click the New branch button at the top right corner.

  3. Enter a new branch name and click Create new branch.

  4. You can switch between branches by clicking the branch dropdown and selecting the branch you want to work on.

Creating branches in a GitHub repository

Creating branches. Image by Author

Making commits

After creating branches, the next step is to make commits. Here are the steps to commit changes:

  1. Navigate to the file you want to edit in your repository.

  2. You can make changes by clicking on the pencil icon to edit the file and making your changes in the text editor.

  3. Click Commit Changes. You can simply scroll down to the Commit changes section. It is important to add a commit message, as it describes the changes you made.

Making commits in a GitHub repository

Making commits. Image by Author

Creating a pull request

Once you have made commits, the next step is to create a pull request. Here are the steps to create a pull request:

  1. Go to the Pull requests tab in your repository.

  2. Click New pull request. GitHub will automatically compare changes between branches.

  3. Review the changes to ensure everything is correct by comparing the changes between your branch and the main branch.

  4. Create a pull request by clicking on Create pull request. You can then add a title and description for your pull request.

  5. Add reviewers if needed, and submit. This is optional but can be very important if you’re concerned about your changes and getting feedback from leadership.

Creating a pull request in a GitHub repository

Creating a pull request. Image by Author

Merging branches

After your pull request has been reviewed and approved, the final step is to merge branches. Here are the steps to merge branches:

  1. After the pull request has been reviewed and approved, navigate to the pull request in the Pull requests tab.
  2. Click the Merge pull request button.
  3. Confirm the merge by clicking Confirm merge.

Besides using the GitHub GUI, you can use Git to do most tasks. For example, you can create a Git repository locally and merge changes from a remote repository with the git pull command. Becoming comfortable in this kind of environment is important as you move up in your career, and DataCamp's Introduction to Git course is a great resource for your journey.  You can also try our GitHub and Git Tutorial for Beginners to get started. 

GitHub Alternatives

GitHub is the most popular platform for version control and collaboration. However, there are several alternatives that have unique offers and benefits. It is very important for aspiring data practitioners to be aware of these platforms that may cater to their specific needs. 

Let's take a look at some of the GitHub alternatives.

GitLab

This is a DevOps platform that offers Git repository management, issue tracking, and CI/CD pipeline features. GitLab is popular for its all-in-one approach to DevOps by integrating a wide range of tools and services.

GitLab logo

GitLab logo

Let’s look at some key features:

  • Integrated CI/CD: GitLab has built-in pipelines for continuous integration and deployment. These pipelines automate code testing and deployment.
  • Issue Tracking: With GitLab’s issue tracking system, you can manage tasks, bugs, and feature requests. This system uses boards, milestones, and labels.
  • Auto DevOps: This feature configures pipelines based on the project structure, which saves time during setup
  • Container Registry: GitLab provides a built-in container registry for managing Docker images.
  • Security Features: Integrated security testing tools allow for continuous monitoring. These tools also scan for vulnerabilities in your codebase.

Bitbucket

Bitbucket is an Atlassian suite product that manages Git repositories, including Mercurial repositories. It is designed to work easily with other Atlassian products, such as Jira and Trello.

Bitbucket logo

Bitbucket Logo

Here are some key features:

  • Branch Permissions: Bitbucket allows fine-grained control over who can access and change branches. This enhances security.
  • CI/CD Integration: Integrates with Bitbucket Pipelines for continuous integration and delivery. This enables automated testing and deployment.
  • Jira Integration: Direct integration with Jira allows for comprehensive project management. It also enables tracking of development progress.
  • Pull Requests: In-line commenting and pull request reviews facilitate effective code review processes.
  • Deployment: Supports deployments through Bitbucket Pipelines, allowing for streamlined deployment processes.

SourceForge

SourceForge is one of the original platforms for hosting and managing open-source projects. It offers a wide range of tools for developers. These include code repositories, bug tracking, and project management features.

Key features include:

  • Project Hosting: It provides free hosting for open-source projects. This includes unlimited bandwidth and storage.
  • Download Stats: Detailed statistics on project downloads help track the popularity of projects. They also help track the progress of projects.
  • Discussion Forums: Built-in forums facilitate community engagement and support.
  • File Release System: An advanced file release system manages project downloads and versions.
  • SVN and Git Support: Supports both Subversion (SVN) and Git version control systems.

AWS CodeCommit

AWS CodeCommit is a fully managed source control service hosted by Amazon Web Services . It allows you to securely store and manage your code in the cloud, integrating seamlessly with other AWS services.

Key features include:

  • Scalability: AWS CodeCommit scales to meet your repository needs without requiring infrastructure management.
  • Security: Integration with AWS Identity and Access Management (IAM) provides fine-grained access control.
  • Integration: Works well with other AWS services like CodePipeline, CodeBuild, and CodeDeploy. This facilitates a complete CI/CD workflow.
  • High Availability: Built on AWS infrastructure. This ensures high availability and durability of your repositories.
  • Encryption: Data is encrypted both in transit and at rest, enhancing security.

Conclusion

In this article, we have explored the fundamental aspects of GitHub, its importance, and detailed its operation. We started by understanding the concept of version control and its significance in managing changes to code and data.  Finally, we provided a practical guide to using GitHub. This involved signing up, creating repositories, committing, managing pull requests, and merging branches. 

A gateway to success in the data space is continuous learning, and to further broaden your horizons on GitHub and its applications in data science, you can listen to our DataFramed podcast episode on The Future of Programming with Kyle Daigle, which provides insights from the COO of GitHub on the evolving landscape of programming, and you can also explore GitHub Concepts, a comprehensive course on all things GitHub. You can also read our guide on GitHub certifications to learn more about how it can benefit your career.


Photo of Samuel Shaibu
Author
Samuel Shaibu
LinkedIn

Experienced data professional and writer who is passionate about empowering aspiring experts in the data space.

Frequently Asked Questions

Is GitHub hard to learn?

While Git, the underlying technology, has a learning curve, GitHub provides a user-friendly interface to manage Git functionalities.

How is Git different from GitHub?

Git is a version control system that tracks changes to files, while GitHub is a platform that hosts Git repositories online.

Is GitHub safe for storing sensitive data?

Public repositories on GitHub are visible to everyone. It’s best practice to avoid storing sensitive data, like passwords or API keys, in public repositories. Use private repositories instead.

Do I need to pay to use GitHub?

GitHub offers a free plan with unlimited public repositories and private repositories (limited storage). Paid plans offer additional features, like increased storage and advanced collaboration.

What are some alternatives to GitHub?

GitLab, Bitbucket, SourceForge, AWS CodeCommit

How can I demonstrate my knowledge of GitHub during interviews?

To effectively showcase your GitHub expertise during interviews, consider obtaining a GitHub Certification which validates your skills, enhances your resume, boosts your confidence, and  shows commitment to learning.

Topics

Learn with DataCamp

Course

GitHub Concepts

2 hr
13.7K
Learn how to use GitHub's various features, navigate the interface and perform everyday collaborative tasks.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related
Git

blog

What is Git? - The Complete Guide to Git

Learn about the most popular version control system and why it's a must-have collaboration tool for data scientists and programmers alike.
Summer Worsley's photo

Summer Worsley

14 min

blog

Version Control For Data Science

Discover how to overcome the steep learning curve of version control for data science, while also taking into account best practices and recommendations.
Greg Wilson's photo

Greg Wilson

8 min

podcast

Machine Learning & Data Science at Github

What is the role of data science in product development at github, what does it means to use computation to build products to solve real-life decision making, practical challenges and what does building data products at github actua
Hugo Bowne-Anderson's photo

Hugo Bowne-Anderson

59 min

cheat-sheet

Complete Git Cheat Sheet

Git lets you manage changes made to files and directories in a project. It allows you to keep track of what you did, undo any changes you decide you don't want, and collaborate at scale with others.
Richie Cotton's photo

Richie Cotton

9 min

tutorial

GitHub and Git Tutorial for Beginners

A beginner's tutorial demonstrating how Git version control works and why it is crucial for data science projects.
Abid Ali Awan's photo

Abid Ali Awan

17 min

tutorial

How to Use GitHub Copilot: Use Cases and Best Practices

Explore how GitHub Copilot works with Visual Studio Code. Learn about its features, pricing, and practical applications for students and developers.
Eugenia Anello's photo

Eugenia Anello

8 min

See MoreSee More