I’m struggling to understand how the following behavior is a good thing in git. See below for an example I put together to help illustrate my problem. Many times my team and myself are getting changes/commits going into branches that we did not want to go there.
> git init sandbox && cd sandbox
> echo "data a" > a.txt && echo "data b" > b.txt
> git add -A && git commit -a -m "initial population"
[master (root-commit) d7eb6af] initial population
2 files changed, 2 insertions(+)
create mode 100644 a.txt
create mode 100644 b.txt
> git branch branch1
> echo "more data a" >> a.txt && git commit -a -m "changed a.txt on master"
[master 11eb82a] changed a.txt on master
1 file changed, 1 insertion(+)
> git branch branch2 && git checkout branch2
Switched to branch 'branch2'
> echo "more data b" >> b.txt && git commit -a -m "changed b.txt on branch2"
[branch2 25b38db] changed b.txt on branch2
1 file changed, 1 insertion(+)
> git checkout branch1
Switched to branch 'branch1'
> git merge branch2
Updating d7eb6af..25b38db
Fast-forward
a.txt | 1 +
b.txt | 1 +
2 files changed, 2 insertions(+)
Notice in the above, a.txt is updated in the merge, even though it was not touched/modified on branch2. In the above scenario I would expect git to be intelligent to recognize that a.txt was not changed on branch2 and therefore when applying updates to branch1, not make those changes.
Is there something I’m doing wrong? Yes, I could cherry pick and that would would for this simplistic example where I know what I changed, but is not realistic under real circumstances where the changes are much larger and you don’t know what might have been affected.
To be clear, I do not want this behavior from git.
‘branch1’ and ‘branch2’ are nothing but commit pointers. They are states of the commit history at certain moments in time. As such, when merging ‘branch2’ into ‘branch1’, git does little more than establish a common ancestor and attempt to apply changes from both trees, together.
Take a simple diagram:
In the example above, ‘branch1’ points at commit
Band ‘branch2’ points at commitE. This describes, more or less, the order of operations you entered above. Were you to merge ‘branch2’ into ‘branch1’, git would find a common ancestor inBthen apply all the history that exists betweenBandEto ‘branch1’, specifically commitsC,D, andE.What you want, however, is just
E. One (bad) solution would be cherry-picking, as you’ve already identified. A much better solution is rebasing ‘branch2’ onto ‘branch1’, thereby rewriting ‘branch2’s history to include only commitEpast ‘branch1’:That results in exactly what you seek, and reads as ‘rebase branch2, which was originally based on master, onto branch1’. Note, I’ve left the ‘branch1’ pointer out of this diagram for simplicity, and
EbecameE'because its commit hash changed (as is a common convention with these diagrams):You could get a similar effect with
git checkout branch2 && git rebase -i B, then remove commitsCandDfrom the interactive rebase session.At my last job we routinely faced this problem with isolated feature branches. Cut at different moments in time from the same production branch, they would pull along unwanted changes if merged without rebasing. As an integration manager, I routinely rewrote their histories to a common point in the past (the last production release), thereby allowing clean merges all the way through. It’s one of many possible workflows. The best answer depends heavily on how your team moves code around. In a CI environment, for example, it’s sometimes less important that
CandDget pulled along with merges like the one you describe.Finally, note that if
Edepends on any code inCorD, this solution will wreak havoc on your history when merging ‘branch1’ (now containing theE'change set) back into ‘master’. If your workflow is incremental, and ‘branch1’ and ‘branch2’ meddle in similar functions and files, merge conflicts will arise as a matter of course. In that case, a closer look at your team’s workflow is probably warranted.