Nathan Reed

Blog Stuff I’ve Made Talks About Me
little-py-site

git-partial-submodule

September 4, 2021 · 3 Comments

View on GitHub

Have you ever thought about adding a submodule to your git project, but you didn’t want to bear the burden of downloading and storing the submodule’s entire history, or you only need a handful of files out of the submodule?

Git provides partial clone and sparse checkout features that can make this happen for top-level repositories, but so far they aren’t available for submodules. That’s a hole I aimed to fill with this project. git-partial-submodule is a tool for setting up submodules with blobless clones. It can also save sparse-checkout patterns in your .gitmodules file, allowing them to be managed by version control, and automatically applied when the submodules are cloned.

As a motivating example, a fresh clone of Dear ImGui consumes about 80 MB (of which 75 MB is in the .git directory) and takes about 10 seconds to clone on a fast connection. It also brings in roughly 200 files, including numerous examples and backends and various other ancillary files. The actual ImGui implementation—the part you need for your app—is in 11 files totaling 2.5 MB.

In contrast, a blobless, sparse clone of Dear ImGui requires only about 7 MB (4.5 MB in the .git directory), takes ~2 seconds to clone, and checks out only the files you want.

(This is not to pick on Dear ImGui at all! These issues arise with any healthy, long-lived project, and the history bloat in particular is an artifact of git’s design.)

One way developers might address this is by “vendoring”, or copying the ImGui files they need into their own repository and checking them in. That can be a legitimate solution, but it has various downsides.

Another solution supported out of the box by git is “shallow” clones, which essentially only download the latest commit and no history. Submodules can be configured to be cloned shallowly. This works, and is useful in some cases such as cloning on a build machine where you’re not going to be manipulating the repository at all. However, shallow clones make it difficult to do normal development workflows with the submodule. In contrast, a blobless clone functions normally with most workflows, as it can download missing data on demand.

Since git’s own submodule commands do not (yet) allow specifying blobless mode or sparse checkout, I built git-partial-submodule to work around this. It’s a single-file Python script that you use just for the initial setup of submodules. Instead of git submodule add, you do git-partial-submodule.py add. When cloning a repository with existing submodules, you use git-partial-submodule.py clone instead of recursively cloning or git submodule update --init.

It works by manually calling git clone with the blobless/sparse options, setting up the submodule repo in your .git/modules directory, and hooking everything up so git sees it as a legit submodule. Afterward, ordinary submodule operations such as fetches and updates should work normally—although I haven’t done super extensive testing on this, and I’ve been warned that blobless/sparse are still experimental git features that may have sharp edges.

The other thing git-partial-submodule does is to save and restore sparse-checkout patterns in your .gitmodules for each submodule. When you only need a subset of the submodule’s file tree, this lets you manage those patterns under version control in the superproject, so that others who clone the project (and are also using git-partial-submodule) will automatically get the right set of files. You can configure this using the ordinary git sparse-checkout commands, but currently you have to remember to do the extra step of saving the patterns to .gitmodules when changing them, or restoring the patterns from .gitmodules after pulling/merging. This might be able to be automated further using some git hooks, but I haven’t looked into it yet.

I’m excited to try out this workflow for some of my own projects, replacing vendored projects with partial submodules, and I hope it will be helpful to some others out there as well. Issues and PRs are open on GitHub, and contributions are welcome. If you end up trying this, let me know if it works for you!

little-py-site

3 Comments on “git-partial-submodule”

Subscribe

  • RSS RSS

Recent Posts

  • Reading Veach’s Thesis, Part 2
  • Reading Veach’s Thesis
  • Texture Gathers and Coordinate Precision
  • git-partial-submodule
  • Slope Space in BRDF Theory
  • Hash Functions for GPU Rendering
  • All Posts

Categories

  • Graphics(32)
  • Coding(23)
  • Math(21)
  • GPU(15)
  • Physics(6)
  • Eye Candy(4)
© 2007–2024 by Nathan Reed. Licensed CC-BY-4.0.