Unbound size of discovery

Gregory Szorc gregory.szorc at gmail.com
Mon Jun 30 18:20:27 CDT 2014


The size of the wire protocol payload for discovery requests and 
responses is proportional to the number of heads in the peer 
repositories. For esoteric repositories, such as Mozilla's Try 
repository which grows to over 10,000 heads before it is reset, we can 
see discovery response payloads grow to over 1 MB! We've also brushed up 
against default HTTP server limits. Mozilla has hit both HTTP header 
size and count limits due to x-hgarg-n headers during discovery. 
Fortunately, we operate our own servers, so we can increase the limits. 
But sometimes there is a load balancer or security device between your 
Mercurial server and your users (e.g. EC2 - although I'm not sure ELB 
imposes such limits).

This kind of unbounded growth is not good for scalability and 
performance. It may rule out Mercurial as a solution for you.

One idea I had was to limit returned heads to only public changesets. 
Another is to allow servers to execute a config-defined revset as part 
of calculating returned heads. These could likely result in clients 
sending redundant changeset data to the remote. But for certain 
scenarios (such as Mozilla's Try where nearly every head stems from a 
public changeset), the redundancy should be negligible.

I've also had other crazy ideas such as having the client skip heads and 
go straight to querying for existence of ancestors in the pushed 
changeset(s).

Perhaps these modes of operation are influenced by a capability. e.g. if 
a remote advertises its heads count, the client can make a determination 
as to whether classical full-heads-based discovery is appropriate.

Before I get too far down the rabbit hole, I was curious what solutions 
have been considered/attempted for dealing with this "discovery bloat."


More information about the Mercurial-devel mailing list