Branching and merging in Mercurial (and Git) explained

Since there seems to be a bit of confusion about just what branching methods are provided by Mercurial as opposed to e.g. Git (at least I was confused...), this page will hopefully explain the different branching mechanisms and alternatives in more detail. Git and Mercurial are on purpose discussed simultanously, since even though the systems are quite similar, the different branching properties are a source of neverending confusion especially for people switching from Git to Mercurial (or vice versa).

Short-term branching in Mercurial

Let's start with the branching model in Mercurial, since it is somewhat simpler. Consider three developers: Alice, Bob and Clark. Alice has started work on a new project, and Bob and Clark are to take part in the development. For the sake of simplicity, let's assume we are in a local network, and all three developers can pull changes from each other. Alice's repository looks currently like this:

branching_expl_00.png

She has created two revisions, A and B. In her repository, these have gotten revision numbers 1 and 2. The revision B is the newest in the repository (called the tip in Mercurial speak, marked green), and also the parent of her current working directory state (marked with an asterisk). Now Bob and Clark both clone her repository and start working. Meanwhile, Alice continues to work too, of course. So after a short while, Alice's repository has changed:

branching_expl_01.png

Bob's repository

branching_expl_02.png

and Clark's repository contain changes too:

branching_expl_03.png

It is important to note that Bob's and Clark's changes are independent of Alice's, which means that each of them has created his own branch. Now, they want to merge their work, and since Alice is project leader, she decides to do it in her repository. She pulls both the changes made by Bob and by Clark into her repository, which now looks quite different from before:

branching_expl_04.png

Now her repository contains three so-called “heads”, which are changesets without children. The C revision is still the parent revision to her work directory, but since she pulled first Bob's and then Clark's changes, the tip of the repository is now changeset E. Mercurial supports any number of heads in the repository, and they don't even have to be named. One can give them names using the bookmarks extension, which would allow Alice to track Bob's and Clark's changes without merging them into her own branch. This is a bit similar to Git's remote-tracking branches (but it is not the same!). One should especially be aware that these heads are seen as permanent parts of Alice's development repository by Mercurial. Alice can manually strip them out of her tree, or create a clone repository containing only her changesets (thus discarding the additional heads), but this always requires a bit of extra work.

However, in this case Alice just wants to merge the changes made by Bob and Clark with her own work, which is the usual case. Thus she merges twice, first with changeset D and then with changeset E, resulting in a single branch of development again:

branching_expl_05.png

Now, Alice's repository has one single head G, which is also the parent revision of her working directory. Bob and Clark can pull this merged repository state from Alice, and everyone is synchronised once again.

Short-term branching in Git

At first, Alice tries the exact same approach in Git as we have seen before, and somehow this does not work quite as expected. The reason for this is that in Git, we only have one head. When Alice does two consecutive “git fetch” commands to get Bob's and Clark's work, these are inserted as “loose” objects in her local database. The latest result of a fetch is saved under the name FETCH_HEAD, with the consequence that the second fetch from Clark overwrites the saved SHA1 ID of Bob's data. So this is apparently not how things are usually done in Git, and Alice reads the documentation again.

To achieve the same workflow we have seen in the Mercurial example, Alice has actually two possibilities. The first solution would be to just pull Bob's changes and merge those first. In fact, if she synchronises with Bob's repository using the “pull” command, the merge takes place automatically. Afterwards, she does the same using Clark's repository, which results in a revision graph that looks exactly like the one we have seen in the Mercurial example (which is why I will not repeat it here).

The second possibility is to use two light-weight branches containing the changes Bob and Clark both make. This is an additional abstraction level that Git offers, which is not supported directly by Mercurial (although a third-party extension providing that functionality exists (which one?). Light-weight branches are completely separate branches inside one common repository. They are quite cheap with respect to space requirements, since they have some properties in common with Mercurial's heads. In fact, light-weight branches in Git are a lot like heads in Mercurial on the technical level, but they are separated much more strongly (and e.g. can easily be removed).

For her purposes, Alice creates two so-called “remote-tracking” branches in her repository, which are just branches that “remember” their origin (making it easier to update them). Afterwards, Alice can simply merge from those two mirroring branches into her own. Git can even do a single merge using both Bob's and Clark's changes at the same time (which is called an octopus merge), but this can create its own unique merging conflicts. The result is the same as before, and as soon as Bob and Clark pull from Alice's repository, they will end up with the merged state again.

Implicit (unnamed) branches

Both Git and Mercurial support unnamed local branches. Actually we have seen them already, since Mercurial's heads and Git's FETCH_HEAD feature are just the very same. We have also seen that Git does not offer many tools to work like this, since it more or less expects you to name every branch. However, one can create unnamed branches quite quickly in both systems. Consider the state of Alice's repository from the beginning:

branching_expl_01.png

Instead of pulling Bob's and Clarks changes, Alice rethinks her changes in C and steps back to the B state. She then makes different changes to try something out and commits these, resulting in yet another changeset H:

branching_expl_06.png

branching_expl_07.png

Alice has thus created an “anonymous” branch. In Mercurial, this is absolutely no problem, since it is just another head. Git however expects you to name this branch if you want to continue to work in it. If you do not, the only way to find it after switching back to the master branch is consulting the reflog, and if you wait too long, it can even be pruned by a future garbage collection cycle. By the way, although it does not require you to do so, naming such a branch is a good idea in Mercurial as well (again through the bookmarks extension).

One should perhaps note an important difference in behaviour between Mercurial and Git at this point. While for Git, branches are more or less separate entities (and can be adressed from outside), the heads in Mercurial are considered a part of the whole branch formed by the repository. Thus if you do a simple synchronisation in Git you will (by default) only get the master branch, while in Mercurial you will always get all heads. You can clone/synchronise specific heads as well in Mercurial, but this is a bit more work since the head names (bookmarks) are local to the repository and cannot be used from outside.

Remote-tracking using separate repositories

Yet another possibility for Alice to track the Changes by Bob and Clark is to create two clone repositories alongside her own, which is supported by both Git and Mercurial. Since cloned repositories always remember the originating repository they were created from, Alice can update them just by changing into the respective directories and pulling without any further parameters. To save space, such tracking repositories will usually not contain a checkout (create them using “clone -n” in Git or “clone -U” in Mercurial).

The advantage of this approach is a better separation of the developers' branches. On the other hand, separate repositories use more space than light-weight branches. So is the fact that Mercurial does not directly support those a problem? Not necessarily, since in many cases they are used in Git one does not need them in Mercurial at all, due to its support for multiple heads.

In the case of a so-called topic branch, e.g. a branch containing an experimental new feature that is being developed by a small group of co-workers, one will almost always use the separate repository approach anyway, since for non-trivial project sizes other factors like e.g. build time become important and in-place switching between branches becomes a more and more time consuming process. This means that it is almost always desirable to have separate checkouts for e.g. the stable and experimental branch.

Permanent branches

The easiest possibility to create permanent branches both in Mercurial and Git is to just create two separate repositories and e.g. put them alongside on the server. This has the advantage of complete separation but comes with the price of addtitional disk usage (which is usually not a real problem considering disk sizes nowadays). Another possibility in Git are light-weight branches for all those cases where a complete separation of branches is not necessary or not wanted.

Mercurial also offers possibilites to create permanent branches inside a repository by storing a name in every changeset. The reason I did non mention this before is that it is almost never a good idea to use this facility for short-term branching, since branches created this way are inherently “eternal”. One can remove them by doing an empty merge with the main branch, but this is like deleting a branch in Subversion: it is not like the branch never existed, it is merely removed from sight in future revisions. One can sometimes strip the branch from the repository completely, but this can be risky. However, the mechanism certainly has its uses if you have to support branches that are both long-term and closely related to each other.

A possible alternative to branches: patch queues

Sometimes, an alternative to using a branch for an experimental feature is to use a patch queue. In this approach, you do not enter your changes directly into the version control system, but rather record them locally as a series of patches. A patch queue tool like e.g. Quilt allows you to record, update and re-order your patches against the code base. Once you are satisfied with your work, you can commit the changes into the VCS. This allows for a “branch-like” workflow where all changes remain local until you decide to transfer them into the VCS repository. This approach is possible using both systems, using the respective extensions (StGIT or Mercurial Queues). These extensions allow you to work much like you normally work with version control using similar commands and (in the case of Mercurial Queues) even going through the normal VCS frontend.

External Resources


CategoryHowTo

BranchingExplained (last edited 2013-08-27 17:11:38 by AugieFackler)