Here’s the scenario: I have a local git repository that mirrors the contents of another source control system (a proprietary one). I’ve written a script that periodically syncs my git branch with that system’s latest copy of the same branch (called by another term in the other system but conceptually similar).
Now, suppose that in the other system, someone creates a branch from the branch I’m currently syncing and starts hacking on it. What I’d like to do is pull down the first version of that other branch, then find the commit in my git version of the main branch that is closest to the new branch. If I can do this, I’ll know which commit from the main branch to make as the parent of this new branch.
This sounds to me like a problem of computing ‘tree distances’. But as SHA1 hashes don’t have a distance metric, is there another way to do this besides the obvious manual deep search on each commit to find out which one has the most number of similar blobs?
UPDATE: See below, found a domain-specific way to do it.
It’s worse than that; in the general case you’ll have to count edit distance on the blobs to see how similar they are.
Hoping this is a rare event, I would clone the git repository and start rolling back versions to locate the commit that is closest to the tree you wish to duplicate. It would be nice to think of using
git bisectfor this, but since there’s no total ordering and no absolute concept ofgoodorbad, I don’t see how to avoid trying every commit.Mininum edit distance is NP-hard as well, so you have a real pain in the ass here.
If you are lucky, in the other system, you can recover the date and time the new branch is created. Then maybe you can just grab the last commit before that timestamp?