I have a repository for storing some large binary files (tifs, jpgs, pdfs) that is growing pretty large. There is also a fair amount of files that are created, removed, and renamed and I don’t care about the individual commit history. This question is somewhat simplified because I’m dealing with a repository that has no branches and no tags.
I’m curious if there’s an easy way to remove some of the history from the system to save space.
I found an old thread on the git mailing list but it doesn’t really specify how to use this (i.e. what the $drop is):
git filter-branch --parent-filter "sed -e 's/-p $drop//'" \
--tag-name-filter cat -- \
--all ^$drop
I think, you can shrink your history following this answer:
How to delete a specific revision of a github gist?
Decide on which points in history, you want to keep.
Then, leave the first after each “keep” as “pick” and mark the others as “squash”.
Then, run the rebase by saving and quitting the editor. At each “keep” point, the message editor will pop up for a combined commit message ranging from the previous “pick” up to the “keep” commit. You can then either just keep the last message or in fact combine those to document the original history without keeping all intermediate states.
After that rebase, the intermediate file data will still be in the repository but now unreferenced.
git gcwill now indeed get you rid of that data.