I have a fork of a large project on Github and would like to both pull changes from other people’s forks and recommend ways that they can pull in my changes.
Adding new named remotes for users that I would like to pull code from as recommended in this question sounds like a great solution, but will doing so significantly affect the size of my local repository?
(I’m going to go ahead and try it, but I couldn’t find any info on this, so I thought I’d ask for posterity.)
(As a note on terminology, these are other “remotes”, not “origins”; “origin” is just the default name for the remote that’s set up for the repository you clone from.)
If you’ve added a remote that’s a fork of the original large repository (and fetch from there) that will generally use very little extra space. This is because of git’s clever storage model. Each file (“blob”) is identified by a hash, each directory (“tree”) is identified by a hash of the hashes of blobs, trees and other objects contained in it, and commits are identified by hashing data that includes the tree at the top level of your source code. So, all the history up to the point where there’s a fork will be represented by commits with the same ID, so there’s no extra storage used for those. After a divergence, there’s only extra storage used for files that have changed – if there are large blobs in the repository, unless their content is changed they’ll still have the same hash, and so will only be stored once. (Even then, git does binary delta compression when packing objects, so storing small changes to large files should still be pretty efficient.)
If the fork has added large new files not present in the original repository then that will add significantly to the amount of space used, of course.