The Mercurial wire protocol

Mercurial performs all of its network transactions over HTTP or SSH.

See also the HttpCommandProtocol and the SshCommandProtocol.

Overview

Discovery of Common Changesets

A couple of functions are used to efficiently discover the boundaries of set of nodes common to both the client and server, without a massive number of round trips or needing to send the entire changelog list.

There are two versions of this today. Set-based discovery is used against newer servers and generally uses fewer roundtrips. It discovers the heads of the common subset. Tree-based discovery is used against older servers and discovers the bases of the incoming set, which is less precise.

Transmitting Changesets

Then there are functions to apply or obtain bundles of changesets. The latter are again present in older versions (changegroup*) that use bases, and a newer version that uses heads of common (getbundle).

Newer Set-based Protocol

Wire functions used:

Gist of how discovery of the common heads works:

For the initial sample, we first add the heads and then walk the undecided set from its heads towards its roots, taking samples at exponentially increasing distances. We stop once the sample is full. If it is not full at the end, we fill it with a random sample of the unsampled nodes. The idea is that this is quick (only ancestor information needed, does not necessarily walk the entire graph).

The full samples walk from both the heads to the roots and vice versa. And they always walk the full graph. If the final sample ends up too big, we take a random sample of it. Since we only do this after the first quick sample has (hopefully) already reduced the undecided set considerably, we ususally don't have to invert a lot of the graph for descendant information (which is costly). We sample from the roots too for cases where both the client and server have a lot of mutually unknown nodes.

Older Tree-based Protocol

The network protocol looks like this:

heads()

 return a list of heads
 (everything new must be an ancestor of one of these heads, so start here)

branches(list)

 for node in list:
   follow the linear section of a branch back to its branchpoint
   return (tip, base, p1, p2)
   (this reduces round trips for long linear branches)

between(list)

 for tip, base in list:
   walk back a linear branch, return elements 1, 2, 4, 8..
   (and this lets us do bisection search if we diverge in the middle of one of these long branches)

changegroup(roots)

 find all changesets descended from roots and return them as a single changegroup

Changegroup Format

A changegroup is a single stream containing:

A group is a list of chunks:

Changegroups are encoded in BundleFormat.


Hgweb/remoterepository currently runs this all through zlib which makes sense on WANs, but less sense on LANs.


CategoryInternals

WireProtocol (last edited 2011-05-01 12:19:06 by PeterArrenbrecht)