[PATCH 3 of 8 V2] internals: document compression negotiation

Tue Nov 29 02:02:59 EST 2016

On Mon, Nov 28, 2016 at 10:58 PM, Gregory Szorc <gregory.szorc at gmail.com>
wrote:

> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1480393573 28800
> #      Mon Nov 28 20:26:13 2016 -0800
> # Node ID 2540270e3fab858be2f20fc30ee2011600bdbee9
> # Parent  e32ff1ce869243618aaab30bb6fba2b13be536ff
> internals: document compression negotiation
>

I changed things significantly since v1:

* No longer use Accept request header. I really didn't care for the
non-spec conformant parsing of that header.
* Introduce a "httpmediatype" capability so "application/mercurial-0.2"
isn't tied directly to the "compression" capability (I like things being
explicit not inferred)
* Introduction of X-HgProto request header (to replace Accept)
* Support for minimum server-side media type detection (although I haven't
done anything with it in this series)
* Differentiated between transmission direction of
application/mercurial-0.2 so client doesn't send that media type to a
server that can't yet recognize it
* Compression format identifier in application/mercurial-0.2 media type is
now variable length

>
> As part of adding zstd support to all of the things, we'll need
> to teach the wire protocol to support non-zlib compression formats.
>
> This commit documents how we'll implement that.
>
> To understand how we arrived at this proposal, let's look at how
> things are done today.
>
> The wire protocol today doesn't have a unified format. Instead,
> there is a limited facility for differentiating replies as successful
> or not. And, each command essentially defines its own response format.
>
> A significant deficiency in the current protocol is the lack of
> payload framing over the SSH transport. In the HTTP transport,
> chunked transfer is used and the end of an HTTP response body (and
> the end of a Mercurial command response) can be identified by a 0
> length chunk. This is how HTTP chunked transfer works. But in the
> SSH transport, there is no such framing, at least for certain
> responses (notably the response to "getbundle" requests). Clients
> can't simply read until end of stream because the socket is
> persistent and reused for multiple requests. Clients need to know
> when they've encountered the end of a request but there is nothing
> simple for them to key off of to detect this. So what happens is
> the client must decode the payload (as opposed to being dumb and
> forwarding frames/packets). This means the payload itself needs
> to support identifying end of stream. In some cases (bundle2), it
> also means the payload can encode "error" or "interrupt" events
> telling the client to e.g. abort processing. The lack of framing
> on the SSH transport and the transfer of its responsibilities to
> e.g. bundle2 is a massive layering violation and a wart on the
> protocol architecture. It needs to be fixed someday by inventing a
> proper framing protocol.
>
> So about compression.
>
> The client transport abstractions have a "_callcompressable()"
> API. This API is called to invoke a remote command that will
> send a compressible response. The response is essentially a
> "streaming" response (no framing data at the Mercurial layer)
> that is fed into a decompressor.
>
> On the HTTP transport, the decompressor is zlib and only zlib.
> There is currently no mechanism for the client to specify an
> alternate compression format. And, clients don't advertise what
> compression formats they support or ask the server to send a
> specific compression format. Instead, it is assumed that non-error
> responses to "compressible" commands are zlib compressed.
>
> On the SSH transport, there is no compression at the Mercurial
> protocol layer. Instead, compression must be handled by SSH
> itself (e.g. `ssh -C`) or within the payload data (e.g. bundle
> compression).
>
> For the HTTP transport, adding new compression formats is pretty
> straightforward. Once you know what decompressor to use, you can
> stream data into the decompressor until you reach a 0 size HTTP
> chunk, at which point you are at end of stream.
>
> So our wire protocol changes for the HTTP transport are pretty
> straightforward: the client and server advertise what compression
> formats they support and an appropriate compression format is
> chosen. We introduce a new HTTP media type to hold compressed
> payloads. The header of the payload defines the compression format
> being used. Whoever is on the receiving end can sniff the first few
> bytes route to an appropriate decompressor.
>
> Support for multiple compression formats is advertised on both
> server and client. The server advertises a "compression" capability
> saying which compression formats it supports and in what order they
> are preferred. Clients advertise their support for multiple
> compression formats and media types via the introduced "X-HgProto"
> request header.
>
> Strictly speaking, servers don't need to advertise which compression
> formats they support. But doing so allows clients to fail fast if
> they don't support any of the formats the server does. This is useful
> in situations like sending bundles, where the client may have to
> perform expensive computation before sending data to the server.
>
> Rather than simply advertise a list of supported compression formats,
> we introduce an additional "httpmediatype" server capability
> advertising which media types the server supports. This means servers
> are explicit about what formats they exchange. IMO, this is superior
> to inferring support from other capabilities (like "compression").
>
> By advertising compression support on each request in the "X-HgProto"
> header and media type and direction at the server level, we are able
> to gradually transition existing commands/responses to the new media
> type and possibly compression. Contrast with the old world, where we
> only supported a single media type and the use of compression was
> built-in to the semantics of the command on both client and server.
> In the new world, if "application/mercurial-0.2" is supported,
> compression is supported. It's that simple.
>
> It's worth noting that we explicitly don't use "Accept,"
> "Accept-Encoding," "Content-Encoding," or "Transfer-Encoding" for
> content negotiation and compression. People knowledgeable of the HTTP
> specifications will say that we should use these because that's
> what they are designed to be used for. They have a point and I
> sympathize with the argument. Earlier versions of this commit even
> defined supported media types in the "Accept" header. However, my
> years of experience rolling out services leveraging HTTP has taught
> me to not trust the HTTP layer, especially if you are going outside
> the normal spec (such as using a custom "Content-Encoding" value to
> represent zstd streams). I've seen load balancers, proxies, and other
> network devices do very bad and unexpected things to HTTP messages
> (like insisting zlib compressed content is decoded and then re-encoded
> at a different compression level or even stripping compression
> completely). I've found that the best way to avoid surprises when
> writing protocols on top of HTTP is to use HTTP as a dumb transport as
> much as possible to minimize the chances that an "intelligent" agent
> between endpoints will muck with your data. While the widespread use of
> TLS is mitigating many intermediate network agents interfering with
> HTTP, there are still problems at the edges, with e.g. the origin HTTP
> server needing to convert HTTP to and from WSGI and buggy or
> feature-lacking HTTP client implementations. I've found the best way to
> avoid these problems is to avoid using headers like "Content-Encoding"
> and to bake as much logic as possible into media types and HTTP message
> bodies. The protocol changes in this commit do rely on a custom HTTP
> request header and the "Content-Type" headers. But we used them before,
> so we shouldn't be increasing our exposure to "bad" HTTP agents.
>
> For the SSH transport, we can't easily implement content negotiation
> to determine compression formats because the SSH transport has no
> content negotiation capabilities today. And without a framing protocol,
> we don't know how much data to feed into a decompressor. So in order
> to implement compression support on the SSH transport, we'd need to
> invent a mechanism to represent content types and an outer framing
> protocol to stream data robustly. While I'm fully capable of doing
> that, it is a lot of work and not something that should be undertaken
> lightly. My opinion is that if we're going to change the SSH transport
> protocol, we should take a long hard look at implementing a grand
> unified protocol that attempts to address all the deficiencies with
> the existing protocol. While I want this to happen, that would be
> massive scope bloat standing in the way of zstd support. So, I've
> decided to take the easy solution: the SSH transport will not gain
> support for multiple compression formats. Keep in mind it doesn't
> support *any* compression today. So essentially nothing is changing
> on the SSH front.
>
> diff --git a/mercurial/help/internals/wireprotocol.txt
> b/mercurial/help/internals/wireprotocol.txt
> --- a/mercurial/help/internals/wireprotocol.txt
> +++ b/mercurial/help/internals/wireprotocol.txt
> @@ -65,11 +65,27 @@ Example HTTP requests::
>      GET /repo?cmd=capabilities
>      X-HgArg-1: foo=bar&baz=hello%20world
>
> +The request media type should be chosen based on server support. If the
> +``httpmediatype`` server capability is present, the client should send
> +the newest mutually supported media type. If this capability is absent,
> +the client must assume the server only supports the
> +``application/mercurial-0.1`` media type.
> +
>  The ``Content-Type`` HTTP response header identifies the response as
> coming
>  from Mercurial and can also be used to signal an error has occurred.
>
> -The ``application/mercurial-0.1`` media type indicates a generic Mercurial
> -response. It matches the media type sent by the client.
> +The ``application/mercurial-*`` media types indicate a generic Mercurial
> +data type.
> +
> +The ``application/mercurial-0.1`` media type is raw Mercurial data. It is
> the
> +predecessor of the format below.
> +
> +The ``application/mercurial-0.2`` media type is compression framed
> Mercurial
> +data. The first byte of the payload indicates the length of the
> compression
> +format identifier that follows. Next are N bytes indicating the
> compression
> +format. e.g. ``zlib``. The remaining bytes are compressed according to
> that
> +compression format. The decompressed data behaves the same as with
> +``application/mercurial-0.1``.
>
>  The ``application/hg-error`` media type indicates a generic error
> occurred.
>  The content of the HTTP response body typically holds text describing the
> @@ -81,15 +97,19 @@ type.
>  Clients also accept the ``text/plain`` media type. All other media
>  types should cause the client to error.
>
> +Behavior of media types is further described in the ``Content
> Negotiation``
> +section below.
> +
>  Clients should issue a ``User-Agent`` request header that identifies the
> client.
>  The server should not use the ``User-Agent`` for feature detection.
>
> -A command returning a ``string`` response issues the
> -``application/mercurial-0.1`` media type and the HTTP response body
> contains
> -the raw string value. A ``Content-Length`` header is typically issued.
> +A command returning a ``string`` response issues a
> +``application/mercurial-0.*`` media type and the HTTP response body
> contains
> +the raw string value (after compression decoding, if used). A
> +``Content-Length`` header is typically issued, but not required.
>
> -A command returning a ``stream`` response issues the
> -``application/mercurial-0.1`` media type and the HTTP response is
> typically
> +A command returning a ``stream`` response issues a
> +``application/mercurial-0.*`` media type and the HTTP response is
> typically
>  using *chunked transfer* (``Transfer-Encoding: chunked``).
>
>  SSH Transport
> @@ -233,6 +253,24 @@ 2006).
>  This capability was introduced at the same time as the ``lookup``
>  capability/command.
>
> +compression
> +-----------
> +
> +Declares support for negotiating compression formats.
> +
> +Presence of this capability indicates the server supports dynamic
> selection
> +of compression formats based on the client request.
> +
> +Servers advertising this capability are required to support the
> +``application/mercurial-0.2`` media type in response to commands returning
> +streams. Servers may support this media type on any command.
> +
> +The value of the capability is a comma-delimited list of strings declaring
> +supported compression formats. The order of the compression formats is in
> +server-preferred order, most preferred first.
> +
> +This capability was introduced in Mercurial 4.1 (released February 2017).
> +
>  getbundle
>  ---------
>
> @@ -252,6 +290,47 @@ comma in the value, as this is reserved
>
>  This capability was introduced in Mercurial 1.9 (released July 2011).
>
> +httpmediatype
> +-------------
> +
> +Indicates which HTTP media types (``Content-Type`` header) the server is
> +capable of receiving and sending.
> +
> +The value of the capability is a comma-delimited list of strings
> identifying
> +support for media type and transmission direction. The following strings
> may
> +be present:
> +
> +0.1rx
> +   Indicates server support for receiving ``application/mercurial-0.1``
> media
> +   types.
> +
> +0.1tx
> +   Indicates server support for sending ``application/mercurial-0.1``
> media
> +   types.
> +
> +0.2rx
> +   Indicates server support for receiving ``application/mercurial-0.2``
> media
> +   types.
> +
> +0.2tx
> +   Indicates server support for sending ``application/mercurial-0.2``
> media
> +   types.
> +
> +min=X
> +   Minimum media type version the server is capable of receiving. Value
> is a
> +   string like ``0.2``.
> +
> +   This capability can be used by servers to limit connections from legacy
> +   clients not using the latest supported media type. However, only
> clients
> +   with knowledge of this capability will know to consult this value. This
> +   capability is present so the client may issue a more user-friendly
> error
> +   when the server has locked out a legacy client.
> +
> +Servers advertising support for the ``application/mercurial-0.2`` media
> type
> +should also advertise the ``compression`` capability.
> +
> +This capability was introduced in Mercurial 4.1 (released February 2017).
> +
>  httppostargs
>  ------------
>
> @@ -416,6 +495,56 @@ Mercurial server replies to the client-i
>  not conforming to the expected command responses is assumed to be not
> related
>  to Mercurial and can be ignored.
>
> +Content Negotiation
> +===================
> +
> +The wire protocol has some mechanisms to help peers determine what content
> +types and encoding the other side will accept. Historically, these
> mechanisms
> +have been built into commands themselves because most commands only send a
> +well-defined response type and only certain commands needed to support
> +functionality like compression.
> +
> +Currently, only the HTTP transport supports content negotiation at the
> protocol
> +layer.
> +
> +HTTP requests advertise supported response formats via the ``X-HgProto``
> +request header. This header consists of a list of space-delimited
> parameters.
> +Each parameter denotes a feature or capability.
> +
> +The following parameters are defined:
> +
> +0.1
> +   Indicates the client supports receiving ``application/mercurial-0.1``
> +   responses.
> +
> +0.2
> +   Indicates the client supports receiving ``application/mercurial-0.2``
> +   responses.
> +
> +comp
> +   Indicates compression formats the client can decode. Value is a list of
> +   comma delimited strings identifying compression formats ordered from
> +   most preferential to least preferential. e.g. ``comp=zstd,zlib,none``.
> +
> +   This parameter does not have an effect if only the ``0.1`` parameter
> +   is defined, as support for ``application/mercurial-0.2`` or greater is
> +   required to use arbitrary compression formats.
> +
> +   If this parameter is not advertised, the server interprets this as
> +   equivalent to ``zlib,none``.
> +
> +Clients may choose to only send this header if the ``httpmediatype``
> +server capability is present, as currently all server-side features
> +consulting this header require the client to opt in to new protocol
> features
> +advertised via the ``httpmediatype`` capability.
> +
> +A server that doesn't receive an ``X-HgProto`` header should infer a
> value of
> +``0.1``. This is compatible with legacy clients.
> +
> +A server receiving a request indicating support for multiple media type
> +versions may respond with any of the supported media types. Not all
> servers
> +may support all media types on all commands.
> +
>  Commands
>  ========
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20161128/d7226697/attachment.html>