(solved, see bottom of the question body)
Looking for this for a long time now, what I have till now is:
- http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/
and - http://progit.org/book/ch9-7.html
Pretty much the same method, but both of them leave objects in pack files… Stuck.
What I tried:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_name'
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc
Still have files in the pack, and this is how I know it:
git verify-pack -v .git/objects/pack/pack-3f8c0...bb.idx | sort -k 3 -n | tail -3
And this:
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch file_name" HEAD
rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune
The same…
Tried git clone trick, it removed some of the files (~3000 of them) but the largest files are still there…
I have some large legacy files in the repository, ~200M, and I really don’t want them there… And I don’t want to reset the repository to 0 🙁
SOLUTION:
This is the shortest way to get rid of the files:
- check .git/packed-refs – my problem was that I had there a
refs/remotes/origin/masterline for a remote repository, delete it, otherwise git won’t remove those files - (optional)
git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5– to check for the largest files - (optional)
git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98– to check what are those files git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names'– to remove a file from all revisionsrm -rf .git/refs/original/– to remove git’s backupgit reflog expire --all --expire='0 days'– to expire all the loose objectsgit fsck --full --unreachable– to check if there are any loose objectsgit repack -A -d– repackinggit prune– to finally remove those objects
I can’t say for sure without access to your repository data, but I believe there are probably one or more packed refs still referencing old commits from before you ran
git filter-branch. This would explain whygit fsck --full --unreachabledoesn’t call the large blob an unreachable object, even though you’ve expired your reflog and removed the original (unpacked) refs.Here’s what I’d do (after
git filter-branchandgit gchave been done):1) Make sure original refs are gone:
rm -rf .git/refs/original2) Expire all reflog entries:
git reflog expire --all --expire='0 days'3) Check for old packed refs
This could potentially be tricky, depending on how many packed refs you have. I don’t know of any Git commands that automate this, so I think you’ll have to do this manually. Make a backup of
.git/packed-refs. Now edit.git/packed-refs. Check for old refs (in particular, see if it packed any of the refs from.git/refs/original). If you find any old ones that don’t need to be there, delete them (remove the line for that ref).After you finish cleaning up the
packed-refsfile, see ifgit fscknotices the unreachable objects:git fsck --full --unreachableIf that worked, and
git fscknow reports your large blob as unreachable, you can move on to the next step.4) Repack your packed archive(s)
git repack -A -dThis will ensure that the unreachable objects get unpacked and stay unpacked.
5) Prune loose (unreachable) objects
git pruneAnd that should do it. Git really should have a better way to manage packed refs. Maybe there is a better way that I don’t know about. In the absence of a better way, manual editing of the
packed-refsfile might be the only way to go.