Skip to main content
HomeTutorialsGit

How to Clone a Specific Branch In Git

Learn how to clone only a single branch from a Git repository to save disk space and reduce cloning time.
Jun 2024  · 6 min read

Have you ever tried to clone the official PyTorch GitHub repository? Well, I have one, and its size is well over 1 GB. The reason it is so humongous has a lot to do with how many branches it has (spoiler, it is over 4000). 

So, what do you do if you are in a similar situation? 

One method you could try is cloning a targeted chunk of the repository instead of downloading the whole thing. In this article, we will cover how to perform that said action — cloning a specific branch of a GitHub repository.

Git Branch Refresher

A graph that shows a simple git branch operation.

Before we start all the “cloning” talk, let’s quickly refresh our knowledge of what branches are in Git.

Branches are the bread and butter of Git, as you will be working inside a branch 99.9% of the time. When you initialize git in a repository, the default branch will be named either master or main. You will use this branch to write your main code base. 

When you want to introduce new features, fix a bug, or just try out some fresh ideas without the fear of royally messing up your main code base, you can create a new branch. So, in a way, git branches are alternate realities of your code base.

Git branches are very cheap, so when you call git branch new_branch name, git creates a pseudo-copy of the branch you are currently in without actually copying any of the files. For this reason, git repositories may end up with dozens, if not hundreds, of branches.

For example, a usual practice in many GitHub repositories of popular frameworks is creating a branch for each new version release. If you visit the Scikit-learn repository, you will see a branch for each major sklearn version:

Screenshot of the scikit-learn github repo.

So, if you clone the 1.2.X branch, you will see the repository states when its version was 1.2.X.

The Basics of Git Clone

Cloning is a fundamental operation in the world of Git version control. It essentially creates a copy of an existing Git repository. This might sound like downloading a zip file but there is a key difference.

When you clone a repository, you aren’t just getting all its files; you are getting a copy of the complete history of the repository, which includes all the files, all the historical versions of those files, and any branches that exist. This makes cloning a powerful tool for collaborating on projects and tracking changes over time.

Cloning a Specific Branch in Git

As mentioned in the introduction, cloning only a specific Git branch is beneficial when you are tight on disk space or if you don’t want to wait a long time for cloning to finish. 

Both of these scenarios are rare in practice, but they can happen. 

One likely scenario is that you are commuting with your travel laptop connected to a metered cellular data plan. Downloading the entire repository on such a plan could eat into your data allowance quickly. Cloning just the specific branch you need allows you to work on the project efficiently while minimizing data usage.

First, let’s use the general approach of cloning the entire repository and checking out a branch:

$ git clone https://github.com/scikit-learn/scikit-learn.git
$ cd scikit-learn
$ git checkout 1.2.X

This downloads all files, commits, and branches and then checks out a branch. The directory size will also reflect that:

$ du -sh scikit-learn/
187M    scikit-learn/

Here is the next method:

$ rm -rf scikit-learn  # Remove the old version
$ git clone -b 1.2.X https://github.com/scikit-learn/scikit-learn.git

This one might look like it is cloning only the 1.2.X branch of Scikit-learn, but that's only half true. The command is still fetching all the branches but checks out 1.2.X immediately. So, it is a shorthand version of the first method. The directory size also confirms that:

$ du -sh scikit-learn/
187M    scikit-learn/

The last method is actually what we want:

# Delete the older version
$ rm -rf scikit-learn
# Clone using the --single-branch tag
$ git clone --single-branch -b 1.2.X https://github.com/scikit-learn/scikit-learn.git

By adding the --single-branch tag, we only fetch and checkout a single branch:

$ du -sh scikit-learn/
163M    scikit-learn/

As you can see, this time, the directory size is smaller.

The Disadvantages of Cloning a Single Branch

Cloning a specific branch is recommended in very few practical cases compared to a full clone. For example, cloning a branch won’t include the full complete history of the repository. 

This can be problematic if you need to reference past commits or collaborate using features like pull requests that rely on the full branch history. 

Also, subsequent updates to the specific branch will require additional commands (like git fetch) to download them, whereas a full clone automatically fetches all updates.

Overall, unless you’re facing severe disk space limitations or have a very specific workflow reason, it’s generally recommended to clone the entire repository. This gives you a complete picture of the project’s history and makes collaboration and version control smoother.

Conclusion

In this article, we’ve learned some methods of cloning a single branch from a Git repository. We’ve observed that cloning only one branch saves disk space and reduces cloning time when a repository is very large. If you want to learn more about Git, check out the following resources:

Thank you for reading!


Photo of Bex Tuychiev
Author
Bex Tuychiev

I am a data science content creator with over 2 years of experience and one of the largest followings on Medium. I like to write detailed articles on AI and ML with a bit of a sarcastıc style because you've got to do something to make them a bit less dull. I have produced over 130 articles and a DataCamp course to boot, with another one in the makıng. My content has been seen by over 5 million pairs of eyes, 20k of whom became followers on both Medium and LinkedIn. 

Topics

Continue Learning With DataCamp

track

Data Engineer in Python

57hrs hours
Gain in-demand skills to efficiently ingest, clean, manage data, and schedule and monitor pipelines, setting you apart in the data engineering field.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

GIT Push and Pull Tutorial

Learn how to perform Git PUSH and PULL requests through GitHub Desktop and the Command-Line.

Olivia Smith

13 min

tutorial

Git Install Tutorial

Learn about Git initial setup, Git LFS, and user-friendly Git GUI applications in this in-depth tutorial.
Abid Ali Awan's photo

Abid Ali Awan

9 min

tutorial

GitHub and Git Tutorial for Beginners

A beginner's tutorial demonstrating how Git version control works and why it is crucial for data science projects.
Abid Ali Awan's photo

Abid Ali Awan

17 min

tutorial

Git Reset and Revert Tutorial for Beginners

A beginner’s guide tutorial demonstrating how to use the Git Revert and Reset commands.
Zoumana Keita 's photo

Zoumana Keita

10 min

tutorial

How to Resolve Merge Conflicts in Git Tutorial

Learn various commands and tools for merging two branches and resolving conflicts in Git, an essential skill for data scientists.
Abid Ali Awan's photo

Abid Ali Awan

16 min

tutorial

How to Use Git Rebase: A Tutorial for Beginners

Discover what Git Rebase is and how to use it in your data science workflows.
Javier Canales Luna's photo

Javier Canales Luna

8 min

See MoreSee More