Wire protocol futures

Josef 'Jeff' Sipek jeffpc at josefsipek.net
Wed Sep 5 11:24:44 EDT 2018


There is a lot of info here, thanks for the write up!

On Fri, Aug 31, 2018 at 15:47:34 -0700, Gregory Szorc wrote:
...
> Assuming you only have primitive data retrieval commands, you are now
> issuing a lot more commands.

While I'm all for allowing simpler servers (and hopefully clients too), I'm
worried about the chattiness of such a protocol - specifically the number of
network round-trips that depend on previous commands completing.

Over the years, I've seen plenty of protocols evolve to reduce chattiness.
For example, NFSv4 added compounds - a way to pack up several RPCs and send
them as a unit, SMB/CIFS reduced the number of RPCs, and so on.  I realize
that both those examples are file systems, but I'd argue that their lessons
apply here as well.

Somewhat relatedly:  The jmap IETF working group [1] is working on a new way
to access email - ideally replacing IMAP.  The interesting thing here is
that the entire design is visibly targetting high latency links.
(Personally, I think this is because the authors are from Australia and
therefore they are very sensitive to latency.)  I don't know if there are
any lessons in jmap that would apply here, but I would certainly encourage
testing on high-latency & high-bandwidth links if there is any concern of
chattiness in the new protocol.

[1] https://datatracker.ietf.org/group/jmap/about/

...
> At the end of the day, the wire protocol command set will be driven by
> practical needs, not by ivory tower architecting. We'll see what shortcuts
> we need to employ in the name of performance and we'll implement them.

That's good to hear.  I just hope that these "bonus" commands will fit more
or less nicely into the new protocol design.  It'd be rather unfortunate if
in the process of adding these bonus commands you reinvented getbundle.

...
> Since we are effectively talking about a new VCS at the wire protocol
> level, let's talk about other crazy ideas. As Augie likes to say, once we
> decide to incur a backwards compatibility break, we can drive a truck
> through it.
> 
> Let's talk about hashes.
> 
> Mercurial uses SHA-1 for content indexing. We know we want to transition
> off of SHA-1 eventually due to security weaknesses.
...
> In addition, Mercurial has 2 ways to store manifests: flat and tree.
...
> 
> One of the ideas I'm exploring in the new wire protocol is the idea of
> "hash namespaces." Essentially, the server's capabilities will advertise
> which hash flavors are supported. Example hash flavors could be
> "hg-sha1-flat" for flat manifests using SHA-1 and "hg-blake2b-tree" for
> tree manifests using blake2b. When a client makes a request, that request
> will be associated with a "hash namespace" such that any nodes referenced
> by that command are in the requested "hash namespace."

While this idea is intriguing, it also means AFAICT that a changeset no
longer has one globally unique ID.  E.g., consider the world where there
are:

	hg-sha256-flat
	hg-blake2b-flat

or:

	hg-blake2b-flat
	hg-blake2b-tree

In both cases, the node id will be 32 bytes/64 hex chars long.  I can no
longer paste at you a hash I see in 'hg log' and (1) know what hash function
generated it, and (2) be certain that you can grep your 'hg log' output for
it and find it.  This whole thing gets even more fun when you share
abbreviated hashes - e.g., abc may be the shortest unique node prefix in
both namespaces, but may map to completely different revisions.

As a side note, wouldn't it be possible to deal with flat<->tree transitions
by making a "dummy" commit that rewrites the manifest to the new format and
sets some flag in .hg/requires?

Anyway, as intriguing as this idea is, I'm skeptical that the resulting UX
will be good.  It also possible that I'm not fully understanding your idea
here :)

> This feature, if implemented, would allow a server/repository to index and
> serve data under multiple hashing methodologies simultaneously. For
> example, pushes to the repository would be indexed under SHA-1 flat, SHA-1
> tree, blake2b flat, and blake2b tree. Assuming the server operator opts
> into this feature, new clones would use whatever format is
> supported/recommended at that time. Existing clones would continue to
> receive SHA-1 flat manifests. New clones would receive blake2b tree
> manifests.

See above about UX.

Regardless, it is certainly something to experiment with and either keep or
throw away.

Thanks for all the work you've put in,

Jeff.

-- 
Once you have their hardware. Never give it back.
(The First Rule of Hardware Acquisition)


More information about the Mercurial-devel mailing list