[PATCH 3 of 8 V2] internals: document compression negotiation

Gregory Szorc gregory.szorc at gmail.com
Sat Dec 24 13:34:26 EST 2016


On Tue, Nov 29, 2016 at 12:02 AM, Gregory Szorc <gregory.szorc at gmail.com>
wrote:

> On Mon, Nov 28, 2016 at 10:58 PM, Gregory Szorc <gregory.szorc at gmail.com>
> wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1480393573 28800
>> #      Mon Nov 28 20:26:13 2016 -0800
>> # Node ID 2540270e3fab858be2f20fc30ee2011600bdbee9
>> # Parent  e32ff1ce869243618aaab30bb6fba2b13be536ff
>> internals: document compression negotiation
>>
>
> I changed things significantly since v1:
>
> * No longer use Accept request header. I really didn't care for the
> non-spec conformant parsing of that header.
> * Introduce a "httpmediatype" capability so "application/mercurial-0.2"
> isn't tied directly to the "compression" capability (I like things being
> explicit not inferred)
> * Introduction of X-HgProto request header (to replace Accept)
> * Support for minimum server-side media type detection (although I haven't
> done anything with it in this series)
> * Differentiated between transmission direction of
> application/mercurial-0.2 so client doesn't send that media type to a
> server that can't yet recognize it
> * Compression format identifier in application/mercurial-0.2 media type is
> now variable length
>

Where do we stand with this series? I was waiting for feedback, but none
arrived.

I'd like to get zstd into the wire protocol and revlogs for 4.1, otherwise
it isn't very useful (I've been holding off patchbombing the revlog series
because I already have enough things in flight). With the holidays, we
effectively have 2 weeks before freeze.

FWIW, getting reviews this cycle has been extremely frustrating to me. I
was planning on landing zstd, `hg debugupgraderepo`, `hg display`, and
various polish work for more robust zstd integration in the 4.1 release.
It's now looking like a good chunk of that work won't make it. The most
frustrating part is that most of the code was ready in November :/ I think
there needs to be a discussion about scaling code review (no later than the
next sprint) because it's been obvious for a few months now that things
aren't as well-oiled as they were several months ago.


>
>
>>
>> As part of adding zstd support to all of the things, we'll need
>> to teach the wire protocol to support non-zlib compression formats.
>>
>> This commit documents how we'll implement that.
>>
>> To understand how we arrived at this proposal, let's look at how
>> things are done today.
>>
>> The wire protocol today doesn't have a unified format. Instead,
>> there is a limited facility for differentiating replies as successful
>> or not. And, each command essentially defines its own response format.
>>
>> A significant deficiency in the current protocol is the lack of
>> payload framing over the SSH transport. In the HTTP transport,
>> chunked transfer is used and the end of an HTTP response body (and
>> the end of a Mercurial command response) can be identified by a 0
>> length chunk. This is how HTTP chunked transfer works. But in the
>> SSH transport, there is no such framing, at least for certain
>> responses (notably the response to "getbundle" requests). Clients
>> can't simply read until end of stream because the socket is
>> persistent and reused for multiple requests. Clients need to know
>> when they've encountered the end of a request but there is nothing
>> simple for them to key off of to detect this. So what happens is
>> the client must decode the payload (as opposed to being dumb and
>> forwarding frames/packets). This means the payload itself needs
>> to support identifying end of stream. In some cases (bundle2), it
>> also means the payload can encode "error" or "interrupt" events
>> telling the client to e.g. abort processing. The lack of framing
>> on the SSH transport and the transfer of its responsibilities to
>> e.g. bundle2 is a massive layering violation and a wart on the
>> protocol architecture. It needs to be fixed someday by inventing a
>> proper framing protocol.
>>
>> So about compression.
>>
>> The client transport abstractions have a "_callcompressable()"
>> API. This API is called to invoke a remote command that will
>> send a compressible response. The response is essentially a
>> "streaming" response (no framing data at the Mercurial layer)
>> that is fed into a decompressor.
>>
>> On the HTTP transport, the decompressor is zlib and only zlib.
>> There is currently no mechanism for the client to specify an
>> alternate compression format. And, clients don't advertise what
>> compression formats they support or ask the server to send a
>> specific compression format. Instead, it is assumed that non-error
>> responses to "compressible" commands are zlib compressed.
>>
>> On the SSH transport, there is no compression at the Mercurial
>> protocol layer. Instead, compression must be handled by SSH
>> itself (e.g. `ssh -C`) or within the payload data (e.g. bundle
>> compression).
>>
>> For the HTTP transport, adding new compression formats is pretty
>> straightforward. Once you know what decompressor to use, you can
>> stream data into the decompressor until you reach a 0 size HTTP
>> chunk, at which point you are at end of stream.
>>
>> So our wire protocol changes for the HTTP transport are pretty
>> straightforward: the client and server advertise what compression
>> formats they support and an appropriate compression format is
>> chosen. We introduce a new HTTP media type to hold compressed
>> payloads. The header of the payload defines the compression format
>> being used. Whoever is on the receiving end can sniff the first few
>> bytes route to an appropriate decompressor.
>>
>> Support for multiple compression formats is advertised on both
>> server and client. The server advertises a "compression" capability
>> saying which compression formats it supports and in what order they
>> are preferred. Clients advertise their support for multiple
>> compression formats and media types via the introduced "X-HgProto"
>> request header.
>>
>> Strictly speaking, servers don't need to advertise which compression
>> formats they support. But doing so allows clients to fail fast if
>> they don't support any of the formats the server does. This is useful
>> in situations like sending bundles, where the client may have to
>> perform expensive computation before sending data to the server.
>>
>> Rather than simply advertise a list of supported compression formats,
>> we introduce an additional "httpmediatype" server capability
>> advertising which media types the server supports. This means servers
>> are explicit about what formats they exchange. IMO, this is superior
>> to inferring support from other capabilities (like "compression").
>>
>> By advertising compression support on each request in the "X-HgProto"
>> header and media type and direction at the server level, we are able
>> to gradually transition existing commands/responses to the new media
>> type and possibly compression. Contrast with the old world, where we
>> only supported a single media type and the use of compression was
>> built-in to the semantics of the command on both client and server.
>> In the new world, if "application/mercurial-0.2" is supported,
>> compression is supported. It's that simple.
>>
>> It's worth noting that we explicitly don't use "Accept,"
>> "Accept-Encoding," "Content-Encoding," or "Transfer-Encoding" for
>> content negotiation and compression. People knowledgeable of the HTTP
>> specifications will say that we should use these because that's
>> what they are designed to be used for. They have a point and I
>> sympathize with the argument. Earlier versions of this commit even
>> defined supported media types in the "Accept" header. However, my
>> years of experience rolling out services leveraging HTTP has taught
>> me to not trust the HTTP layer, especially if you are going outside
>> the normal spec (such as using a custom "Content-Encoding" value to
>> represent zstd streams). I've seen load balancers, proxies, and other
>> network devices do very bad and unexpected things to HTTP messages
>> (like insisting zlib compressed content is decoded and then re-encoded
>> at a different compression level or even stripping compression
>> completely). I've found that the best way to avoid surprises when
>> writing protocols on top of HTTP is to use HTTP as a dumb transport as
>> much as possible to minimize the chances that an "intelligent" agent
>> between endpoints will muck with your data. While the widespread use of
>> TLS is mitigating many intermediate network agents interfering with
>> HTTP, there are still problems at the edges, with e.g. the origin HTTP
>> server needing to convert HTTP to and from WSGI and buggy or
>> feature-lacking HTTP client implementations. I've found the best way to
>> avoid these problems is to avoid using headers like "Content-Encoding"
>> and to bake as much logic as possible into media types and HTTP message
>> bodies. The protocol changes in this commit do rely on a custom HTTP
>> request header and the "Content-Type" headers. But we used them before,
>> so we shouldn't be increasing our exposure to "bad" HTTP agents.
>>
>> For the SSH transport, we can't easily implement content negotiation
>> to determine compression formats because the SSH transport has no
>> content negotiation capabilities today. And without a framing protocol,
>> we don't know how much data to feed into a decompressor. So in order
>> to implement compression support on the SSH transport, we'd need to
>> invent a mechanism to represent content types and an outer framing
>> protocol to stream data robustly. While I'm fully capable of doing
>> that, it is a lot of work and not something that should be undertaken
>> lightly. My opinion is that if we're going to change the SSH transport
>> protocol, we should take a long hard look at implementing a grand
>> unified protocol that attempts to address all the deficiencies with
>> the existing protocol. While I want this to happen, that would be
>> massive scope bloat standing in the way of zstd support. So, I've
>> decided to take the easy solution: the SSH transport will not gain
>> support for multiple compression formats. Keep in mind it doesn't
>> support *any* compression today. So essentially nothing is changing
>> on the SSH front.
>>
>> diff --git a/mercurial/help/internals/wireprotocol.txt
>> b/mercurial/help/internals/wireprotocol.txt
>> --- a/mercurial/help/internals/wireprotocol.txt
>> +++ b/mercurial/help/internals/wireprotocol.txt
>> @@ -65,11 +65,27 @@ Example HTTP requests::
>>      GET /repo?cmd=capabilities
>>      X-HgArg-1: foo=bar&baz=hello%20world
>>
>> +The request media type should be chosen based on server support. If the
>> +``httpmediatype`` server capability is present, the client should send
>> +the newest mutually supported media type. If this capability is absent,
>> +the client must assume the server only supports the
>> +``application/mercurial-0.1`` media type.
>> +
>>  The ``Content-Type`` HTTP response header identifies the response as
>> coming
>>  from Mercurial and can also be used to signal an error has occurred.
>>
>> -The ``application/mercurial-0.1`` media type indicates a generic
>> Mercurial
>> -response. It matches the media type sent by the client.
>> +The ``application/mercurial-*`` media types indicate a generic Mercurial
>> +data type.
>> +
>> +The ``application/mercurial-0.1`` media type is raw Mercurial data. It
>> is the
>> +predecessor of the format below.
>> +
>> +The ``application/mercurial-0.2`` media type is compression framed
>> Mercurial
>> +data. The first byte of the payload indicates the length of the
>> compression
>> +format identifier that follows. Next are N bytes indicating the
>> compression
>> +format. e.g. ``zlib``. The remaining bytes are compressed according to
>> that
>> +compression format. The decompressed data behaves the same as with
>> +``application/mercurial-0.1``.
>>
>>  The ``application/hg-error`` media type indicates a generic error
>> occurred.
>>  The content of the HTTP response body typically holds text describing the
>> @@ -81,15 +97,19 @@ type.
>>  Clients also accept the ``text/plain`` media type. All other media
>>  types should cause the client to error.
>>
>> +Behavior of media types is further described in the ``Content
>> Negotiation``
>> +section below.
>> +
>>  Clients should issue a ``User-Agent`` request header that identifies the
>> client.
>>  The server should not use the ``User-Agent`` for feature detection.
>>
>> -A command returning a ``string`` response issues the
>> -``application/mercurial-0.1`` media type and the HTTP response body
>> contains
>> -the raw string value. A ``Content-Length`` header is typically issued.
>> +A command returning a ``string`` response issues a
>> +``application/mercurial-0.*`` media type and the HTTP response body
>> contains
>> +the raw string value (after compression decoding, if used). A
>> +``Content-Length`` header is typically issued, but not required.
>>
>> -A command returning a ``stream`` response issues the
>> -``application/mercurial-0.1`` media type and the HTTP response is
>> typically
>> +A command returning a ``stream`` response issues a
>> +``application/mercurial-0.*`` media type and the HTTP response is
>> typically
>>  using *chunked transfer* (``Transfer-Encoding: chunked``).
>>
>>  SSH Transport
>> @@ -233,6 +253,24 @@ 2006).
>>  This capability was introduced at the same time as the ``lookup``
>>  capability/command.
>>
>> +compression
>> +-----------
>> +
>> +Declares support for negotiating compression formats.
>> +
>> +Presence of this capability indicates the server supports dynamic
>> selection
>> +of compression formats based on the client request.
>> +
>> +Servers advertising this capability are required to support the
>> +``application/mercurial-0.2`` media type in response to commands
>> returning
>> +streams. Servers may support this media type on any command.
>> +
>> +The value of the capability is a comma-delimited list of strings
>> declaring
>> +supported compression formats. The order of the compression formats is in
>> +server-preferred order, most preferred first.
>> +
>> +This capability was introduced in Mercurial 4.1 (released February 2017).
>> +
>>  getbundle
>>  ---------
>>
>> @@ -252,6 +290,47 @@ comma in the value, as this is reserved
>>
>>  This capability was introduced in Mercurial 1.9 (released July 2011).
>>
>> +httpmediatype
>> +-------------
>> +
>> +Indicates which HTTP media types (``Content-Type`` header) the server is
>> +capable of receiving and sending.
>> +
>> +The value of the capability is a comma-delimited list of strings
>> identifying
>> +support for media type and transmission direction. The following strings
>> may
>> +be present:
>> +
>> +0.1rx
>> +   Indicates server support for receiving ``application/mercurial-0.1``
>> media
>> +   types.
>> +
>> +0.1tx
>> +   Indicates server support for sending ``application/mercurial-0.1``
>> media
>> +   types.
>> +
>> +0.2rx
>> +   Indicates server support for receiving ``application/mercurial-0.2``
>> media
>> +   types.
>> +
>> +0.2tx
>> +   Indicates server support for sending ``application/mercurial-0.2``
>> media
>> +   types.
>> +
>> +min=X
>> +   Minimum media type version the server is capable of receiving. Value
>> is a
>> +   string like ``0.2``.
>> +
>> +   This capability can be used by servers to limit connections from
>> legacy
>> +   clients not using the latest supported media type. However, only
>> clients
>> +   with knowledge of this capability will know to consult this value.
>> This
>> +   capability is present so the client may issue a more user-friendly
>> error
>> +   when the server has locked out a legacy client.
>> +
>> +Servers advertising support for the ``application/mercurial-0.2`` media
>> type
>> +should also advertise the ``compression`` capability.
>> +
>> +This capability was introduced in Mercurial 4.1 (released February 2017).
>> +
>>  httppostargs
>>  ------------
>>
>> @@ -416,6 +495,56 @@ Mercurial server replies to the client-i
>>  not conforming to the expected command responses is assumed to be not
>> related
>>  to Mercurial and can be ignored.
>>
>> +Content Negotiation
>> +===================
>> +
>> +The wire protocol has some mechanisms to help peers determine what
>> content
>> +types and encoding the other side will accept. Historically, these
>> mechanisms
>> +have been built into commands themselves because most commands only send
>> a
>> +well-defined response type and only certain commands needed to support
>> +functionality like compression.
>> +
>> +Currently, only the HTTP transport supports content negotiation at the
>> protocol
>> +layer.
>> +
>> +HTTP requests advertise supported response formats via the ``X-HgProto``
>> +request header. This header consists of a list of space-delimited
>> parameters.
>> +Each parameter denotes a feature or capability.
>> +
>> +The following parameters are defined:
>> +
>> +0.1
>> +   Indicates the client supports receiving ``application/mercurial-0.1``
>> +   responses.
>> +
>> +0.2
>> +   Indicates the client supports receiving ``application/mercurial-0.2``
>> +   responses.
>> +
>> +comp
>> +   Indicates compression formats the client can decode. Value is a list
>> of
>> +   comma delimited strings identifying compression formats ordered from
>> +   most preferential to least preferential. e.g. ``comp=zstd,zlib,none``.
>> +
>> +   This parameter does not have an effect if only the ``0.1`` parameter
>> +   is defined, as support for ``application/mercurial-0.2`` or greater is
>> +   required to use arbitrary compression formats.
>> +
>> +   If this parameter is not advertised, the server interprets this as
>> +   equivalent to ``zlib,none``.
>> +
>> +Clients may choose to only send this header if the ``httpmediatype``
>> +server capability is present, as currently all server-side features
>> +consulting this header require the client to opt in to new protocol
>> features
>> +advertised via the ``httpmediatype`` capability.
>> +
>> +A server that doesn't receive an ``X-HgProto`` header should infer a
>> value of
>> +``0.1``. This is compatible with legacy clients.
>> +
>> +A server receiving a request indicating support for multiple media type
>> +versions may respond with any of the supported media types. Not all
>> servers
>> +may support all media types on all commands.
>> +
>>  Commands
>>  ========
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20161224/23f0d02f/attachment.html>


More information about the Mercurial-devel mailing list