The single-mainline with short-lived topic branches workflow has won. It’s been popularized as trunk-based development and it’s the default git workflow most people use. There was a time where other workflows were popular. The success of GitHub and the simplicity of the single-mainline model has made it omnipresent. Its simplicity has advantages, specially for web applications that don’t need long-lived separate versions.
Other workflows are still useful for software that needs to develop multiple versions simultaneously. For example, for targeting a variety of platforms (operating system, hardware), when projects need independent subsystems, or for supporting update channels with distinct maintenance/risk models (maintenance, stable, experimental, etc.). An extreme example is the linux kernel, which maintains a few long-lived branches and requires a very specific branching/merging process.
Most people are no longer exposed to these other workflows. That’s good. If it’s not need it, it’s unnecessary complexity. However, the lack of exposure reduces the opportunities to learn or remember how certain aspects of git work. Namely, the tools to integrate changes from one place to another, such as git merge
and git cherry-pick
.
This article aims to demistify git merge
:
git switch version-a
git merge version-b
The 3-way merge algorithm
Consider the simplest case: there are two different versions of a single file, version-a
and version-b
, that you want to merge.
version-a
- bucatini
- salt and pepper
- guanciale
- egg
version-b:
- rigatoni
- salt and pepper
- bacon
- egg
- parmesan cheese
How would you merge them, manually?
Unless you aim to override version-a
with version-b
, there’s no enough information to mix them together. Fortunately, this problem has been solved decades ago. There’s a merge path if you have access to an extra piece of data: an older version of the document that is a “common ancestor” to those two versions. This is called 3-way merge algorithm or diff3, and it works this way:
- Consider
common-ancestor
an older version of the document, from whichversion-a
andversion-b
started diverging. - If a line changed only in one of the versions with respect to the common ancestor, apply the change.
- If both versions changed the line with respect to the common ancestor, accept both changes and mark them for manual review.
So, given the following input:
version-a
- bucatini
- salt and pepper
- guanciale
- egg
common-ancestor:
- spaghetti
- salt and pepper
- bacon
- egg
version-b:
- rigatoni
- salt and pepper
- bacon
- egg
- parmesan cheese
The result of applying that algorithm manually would be:
<<<<<<< version-a
- bucatini
=======
- rigatoni
>>>>>>> version-b
- salt and pepper
- guanciale
- egg
- parmesan cheese
What happened was that:
- Both the
spaghetti => rigatoni
andspaghetti => bucatini
changes were accepted and marked for manual review. The conflicts are displayed using the diff format. If you are a visual person, this other example may be helpful. - The
bacon => guanciale
change was applied, since the line only changed inversion-a
. - The addition of
parmesan cheese
was also applied, since it only changed inversion-b
.
This is a well known algorithm that was popularized by the diff3 utility, first published in 1979 as part of the UNIX OS. It can be used to compare or merge two documents provided a common ancestor. You can try it yourself:
diff3 -m version-a common-ancestor version-b
This is the base algorithm that git uses to merge two branches.
git merge deconstructed
As per the 3-way merge algorithm, the git merge
operation can be deconstructed into three steps:
- Find a commit that is a common ancestor of the branches to be merged.
- Apply the 3-way merge algorithm, using the common ancestor and the two branches.
- Commit if all conflicts were automatically resolved. Otherwise, mark the unresolved for manual inspection and don’t commit.
This translates to the following low-level, aka plumbling, commands:
# 1. Find a common ancestor.
MERGE_BASE=$(git merge-base version-a version-b)
# 2. Apply the 3-way merge algorithm.
git merge-tree $MERGE_BASE version-a version-b
# 3. Commit if no conflicts.
# Ommitted for clarity, see
# https://git-scm.com/docs/git-merge-tree#_usage_notes
# for a full example using merge-tree, commit-tree, and update-ref.
merge-base
takes the two commits, version-a
and version-b
and works its way back via the parent links until it finds a common ancestor.
merge-tree
is git’s version of diff3
. The name may be more clarifying if we consider the underlying data structures git works with: each commit points to a single tree
(a “folder”) that hold other trees
(“folders”) and blobs
(“files”). merge-tree
takes three trees and merges them using the 3-way merge algorithm described before.
Try it yourself:
# Set up repo
mkdir /tmp/test-merge
cd /tmp/test-merge
git init
# Prepare common ancestor
echo "- spaghetti
- salt and pepper
- bacon
- egg" > list.txt
git add list.txt
git commit -m 'Common ancestor'
# Prepare branches
git branch version-a
git branch version-b
git switch version-b
echo "- rigatoni
- salt and pepper
- bacon
- egg
- parmesan cheese" > list.txt
git commit -am 'Version b'
git switch version-a
echo "- bucatini
- salt and pepper
- guanciale
- egg" > list.txt
git commit -am 'Version a'
# Experiment with the low-level commands
MERGE_BASE=$(git merge-base version-a version-b)
git merge-tree $MERGE_BASE version-a version-b
# Try the high-level merge command.
git merge version-b
The result would be the same as using the diff3 tool above.
A note about “merging branches”
The git merge operation takes as input the three tree
objects linked by the corresponding commits. Then, it uses the 3-way merge algorithm to output a new merged tree.
This is an important detail to consider. We usually talk about “merging branches”, and that wording may lead us to believe that the merge “processes branches” in some way — it doesn’t. Or that each individual commit of a branch is considered somehow in the merge process — they aren’t. Or that how the intermediate commits ended up in the branch is impactful for the merge — it isn’t.
A branch name is just an alias for the tip the branch. When we tell git to merge branches, we are asking it to operate with the last commits of those branches. It doesn’t process any intermediate commits.
A concrete example where this understanding is useful is available in the documentation:
With the strategies that use 3-way merge (including the default, ort), if a change is made on both branches, but later reverted on one of the branches, that change will be present in the merged result; some people find this behavior confusing. It occurs because only the heads and the merge base are considered when performing a merge, not the individual commits. The merge algorithm therefore considers the reverted change as no change at all, and substitutes the changed version instead.
A note about “fast-forward merges”
Sometimes, git merge
doesn’t need to merge anything. In the following scenario:
A -- B -- C (version-a)
\
D -- E (version-b)
If you execute:
git switch version-a
git merge version-b
The result will be (no merge commit):
A -- B -- C -- D -- E (version-a, version-b)
That’s a fast-forward merge. It is different from a true merge in that it doesn’t produce a merge commit with multiple parents and doesn’t go through the merge process described above. A fast-forward merge happens when the merged-in branch is a direct descendant of the current branch, aka HEAD
.
In the example, version-b
is a direct descendant of version-a
, so git will just update the version-a
reference to point to commit E — no need for engaging merge.
Coda
This article discussed the 3-way merge algorithm, wich is the default when merging two branches, but git comes with more strategies. They can be useful in different contexts.
Additionally, the git merge
command is not the only integration tool available. I’ve already hinted at git cherry-pick
. I’d argue that git rebase
is worth understanding as well, even if it’s not an “integration” tool by itself.
This article is already too long to cover these topics in depth, so I’ll save them for another day.
Leave a Reply