We have a git project which has quite a big history.
Specifically, early in the project there were quite a lot of binary resource files in the project, these have now been removed as they’re effectively external resources.
However, the size of our repository is >200MB (the total checkout is currently ~20MB) due to having these files previously committed.
What we’d like to do is ‘collapse’ the history so that the repository appears to have been created from a later revision than it was. For example
1-----2-----3-----4-----+---+---+ \ / +-----+---+---+
- Repository created
- Large set of binary files added
- Large set of binary files removed
- New intended ‘start’ of repository
So effectively we want to lose the project history before a certain point. At this point there is only one branch, so there’s no complication with trying to deal with multiple start points etc. However we don’t want to lose all of the history and start a new repository with the current version.
Is this possible, or are we doomed to have a bloated repository forever?
You can remove the binary bloat and keep the rest of your history. Git allows you to reorder and ‘squash’ prior commits, so you can combine just the commits that add and remove your big binary files. If the adds were all done in one commit and the removals in another, this will be much easier than dealing with each file.
Search this for the commits that add and delete your binary files and note their SHA1s, say
2bcdefand3cdef3.Then to edit the repo’s history, use
rebase -icommand with its interactive option, starting with the parent of the commit where you added your binaries. It will launch your $EDITOR and you’ll see a list of commits starting with2bcdef:Insert
squash 3cdef3as the second line and remove the line which sayspick 3cdef3from the list. You now have a list of actions for the interactiverebasewhich will combine the commits which add and delete your binaries into one commit whose diff is just any other changes in those commits. Then it will reapply all of the subsequent commits in order, when you tell it to complete:This will take a minute or two.
You now have a repo that no longer has the binaries coming or going. But they will still take up space because, by default, Git keeps changes around for 30 days before they can be garbage-collected, so that you can change your mind. If you want to remove them now:
Now you’ve removed the bloat but kept the rest of your history.