[PATCH 6 of 6 zstd-wireproto V3] protocol: send application/mercurial-0.2 responses to capable clients

Augie Fackler raf at durin42.com
Tue Jan 10 15:25:17 UTC 2017


On Sat, Dec 24, 2016 at 03:33:10PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1482618572 25200
> #      Sat Dec 24 15:29:32 2016 -0700
> # Node ID fee9a440173e103e96b46ad278e497636c4b6b53
> # Parent  e78404d994d9da719af16647f6decaa8aeaa56b7
> protocol: send application/mercurial-0.2 responses to capable clients

I've got this tentatively queued - I'm going to make another pass over
it today before pushing it, because it's a lot to digest.

There are some coding style things that we should clean up as
follow-ups, but the review overhead of this series is high enough that
I'm planning to land it and then we can do followups for any style
issues that are problematic - I only want to bounce this for actual
correctness problems at this point.

>
> With this commit, the HTTP transport now parses the X-HgProto-<N>
> header to determine what media type and compression engine to use for
> responses. So far, we only compress responses that are already being
> compressed with zlib today (stream response types to specific
> commands). We can expand things to cover additional response types
> later.
>
> The practical side-effect of this commit is that non-zlib compression
> engines will be used if both ends support them. This means if both
> ends have zstd support, zstd - not zlib - will be used to compress
> data!
>
> When cloning the mozilla-unified repository between a local HTTP
> server and client, the benefits of non-zlib compression are quite
> noticeable:
>
>   engine     server CPU (s)   client CPU (s)    bundle size
> zlib (l=6)      174.1            283.2         1,148,547,026
> zstd (l=1)       99.2            267.3         1,127,513,841
> zstd (l=3)      103.1            266.9         1,018,861,363
> zstd (l=7)      128.3            269.7           919,190,278
> zstd (l=10)     162.0               -            894,547,179
> none             95.3            277.2         4,097,566,064
>
> The default zstd compression level is 3. So if you deploy zstd
> capable Mercurial to your clients and servers and CPU time on
> your server is dominated by "getbundle" requests (clients cloning
> and pulling) - and my experience at Mozilla tells me this is often
> the case - this commit could drastically reduce your server-side
> CPU usage *and* save on bandwidth costs!
>
> Another benefit of this change is that server operators can install
> *any* compression engine. While it isn't enabled by default, the
> "none" compression engine can now be used to disable wire protocol
> compression completely. Previously, commands like "getbundle" always
> zlib compressed output, adding considerable overhead to generating
> responses. If you are on a high speed network and your server is under
> high load, it might be advantageous to trade bandwidth for CPU.
> Although, zstd at level 1 doesn't use that much CPU, so I'm not
> convinced that disabling compression wholesale is worthwhile. And, my
> data seems to indicate a slow down on the client without compression.
> I suspect this is due to a lack of buffering resulting in an increase
> in socket read() calls and/or the fact we're transferring an extra 3 GB
> of data (parsing HTTP chunked transfer and processing extra TCP packets
> can add up). This is definitely worth investigating and optimizing. But
> since the "none" compressor isn't enabled by default, I'm inclined to
> punt on this issue.
>
> This commit introduces tons of tests. Some of these should arguably
> have been implemented on previous commits. But it was difficult to
> test without the server functionality in place.
>
> diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
> --- a/mercurial/help/config.txt
> +++ b/mercurial/help/config.txt
> @@ -1606,8 +1606,20 @@ Controls generic server settings.
>      but sends more bytes to clients.
>
>      This option only impacts the HTTP server.
>
> +``zstdlevel``
> +    Integer between ``1`` and ``22`` that controls the zstd compression level
> +    for wire protocol commands. ``1`` is the minimal amount of compression and
> +    ``22`` is the highest amount of compression.
> +
> +    The default (``3``) should be significantly faster than zlib while likely
> +    delivering better compression ratios.
> +
> +    This option only impacts the HTTP server.
> +
> +    See also ``server.zliblevel``.
> +
>  ``smtp``
>  --------
>
>  Configuration for extensions that need to send email messages.
> diff --git a/mercurial/hgweb/protocol.py b/mercurial/hgweb/protocol.py
> --- a/mercurial/hgweb/protocol.py
> +++ b/mercurial/hgweb/protocol.py
> @@ -7,8 +7,9 @@
>
>  from __future__ import absolute_import
>
>  import cgi
> +import struct
>
>  from .common import (
>      HTTP_OK,
>  )
> @@ -22,8 +23,9 @@ stringio = util.stringio
>  urlerr = util.urlerr
>  urlreq = util.urlreq
>
>  HGTYPE = 'application/mercurial-0.1'
> +HGTYPE2 = 'application/mercurial-0.2'
>  HGERRTYPE = 'application/hg-error'
>
>  def decodevaluefromheaders(req, headerprefix):
>      """Decode a long value from multiple HTTP request headers."""
> @@ -82,26 +84,82 @@ class webproto(wireproto.abstractserverp
>          val = self.ui.fout.getvalue()
>          self.ui.ferr, self.ui.fout = self.oldio
>          return val
>
> -    def compresschunks(self, chunks):
> -        # Don't allow untrusted settings because disabling compression or
> -        # setting a very high compression level could lead to flooding
> -        # the server's network or CPU.
> -        opts = {'level': self.ui.configint('server', 'zliblevel', -1)}
> -        return util.compengines['zlib'].compressstream(chunks, opts)
> -
>      def _client(self):
>          return 'remote:%s:%s:%s' % (
>              self.req.env.get('wsgi.url_scheme') or 'http',
>              urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
>              urlreq.quote(self.req.env.get('REMOTE_USER', '')))
>
> +    def responsetype(self, v1compressible=False):
> +        """Determine the appropriate response type and compression settings.
> +
> +        The ``v1compressible`` argument states whether the response with
> +        application/mercurial-0.1 media types should be zlib compressed.
> +
> +        Returns a tuple of (mediatype, compengine, engineopts).
> +        """
> +        # For now, if it isn't compressible in the old world, it's never
> +        # compressible. We can change this to send uncompressed 0.2 payloads
> +        # later.
> +        if not v1compressible:
> +            return HGTYPE, None, None
> +
> +        # Determine the response media type and compression engine based
> +        # on the request parameters.
> +        protocaps = decodevaluefromheaders(self.req, 'X-HgProto').split(' ')
> +
> +        if '0.2' in protocaps:
> +            # Default as defined by wire protocol spec.
> +            compformats = ['zlib', 'none']
> +            for cap in protocaps:
> +                if cap.startswith('comp='):
> +                    compformats = cap[5:].split(',')
> +                    break
> +
> +            # Now find an agreed upon compression format.
> +            for engine in wireproto.supportedcompengines(self.ui, self,
> +                                                         util.SERVERROLE):
> +                if engine.wireprotosupport().name in compformats:
> +                    opts = {}
> +                    level = self.ui.configint('server',
> +                                              '%slevel' % engine.name())
> +                    if level is not None:
> +                        opts['level'] = level
> +
> +                    return HGTYPE2, engine, opts
> +
> +            # No mutually supported compression format. Fall back to the
> +            # legacy protocol.
> +
> +        # Don't allow untrusted settings because disabling compression or
> +        # setting a very high compression level could lead to flooding
> +        # the server's network or CPU.
> +        opts = {'level': self.ui.configint('server', 'zliblevel', -1)}
> +        return HGTYPE, util.compengines['zlib'], opts
> +
>  def iscmd(cmd):
>      return cmd in wireproto.commands
>
>  def call(repo, req, cmd):
>      p = webproto(req, repo.ui)
> +
> +    def genversion2(gen, compress, engine, engineopts):
> +        # application/mercurial-0.2 always sends a payload header
> +        # identifying the compression engine.
> +        name = engine.wireprotosupport().name
> +        assert 0 < len(name) < 256
> +        yield struct.pack('B', len(name))
> +        yield name
> +
> +        if compress:
> +            for chunk in engine.compressstream(gen, opts=engineopts):
> +                yield chunk
> +        else:
> +            for chunk in gen:
> +                yield chunk
> +
>      rsp = wireproto.dispatch(repo, p, cmd)
>      if isinstance(rsp, str):
>          req.respond(HTTP_OK, HGTYPE, body=rsp)
>          return []
> @@ -110,12 +168,18 @@ def call(repo, req, cmd):
>              gen = iter(lambda: rsp.reader.read(32768), '')
>          else:
>              gen = rsp.gen
>
> -        if rsp.v1compressible:
> -            gen = p.compresschunks(gen)
> +        # This code for compression should not be streamres specific. It
> +        # is here because we only compress streamres at the moment.
> +        mediatype, engine, engineopts = p.responsetype(rsp.v1compressible)
>
> -        req.respond(HTTP_OK, HGTYPE)
> +        if mediatype == HGTYPE and rsp.v1compressible:
> +            gen = engine.compressstream(gen, engineopts)
> +        elif mediatype == HGTYPE2:
> +            gen = genversion2(gen, rsp.v1compressible, engine, engineopts)
> +
> +        req.respond(HTTP_OK, mediatype)
>          return gen
>      elif isinstance(rsp, wireproto.pushres):
>          val = p.restore()
>          rsp = '%d\n%s' % (rsp.res, val)
> diff --git a/tests/get-with-headers.py b/tests/get-with-headers.py
> --- a/tests/get-with-headers.py
> +++ b/tests/get-with-headers.py
> @@ -34,15 +34,24 @@ formatjson = False
>  if '--json' in sys.argv:
>      sys.argv.remove('--json')
>      formatjson = True
>
> +hgproto = None
> +if '--hgproto' in sys.argv:
> +    idx = sys.argv.index('--hgproto')
> +    hgproto = sys.argv[idx + 1]
> +    sys.argv.pop(idx)
> +    sys.argv.pop(idx)
> +
>  tag = None
>  def request(host, path, show):
>      assert not path.startswith('/'), path
>      global tag
>      headers = {}
>      if tag:
>          headers['If-None-Match'] = tag
> +    if hgproto:
> +        headers['X-HgProto-1'] = hgproto
>
>      conn = httplib.HTTPConnection(host)
>      conn.request("GET", '/' + path, None, headers)
>      response = conn.getresponse()
> diff --git a/tests/test-http-protocol.t b/tests/test-http-protocol.t
> --- a/tests/test-http-protocol.t
> +++ b/tests/test-http-protocol.t
> @@ -41,4 +41,127 @@ Order of engines can also change
>    $ get-with-headers.py 127.0.0.1:$HGPORT '?cmd=capabilities' | tr ' ' '\n' | grep compression
>    compression=none,zlib
>
>    $ killdaemons.py
> +
> +Start a default server again
> +
> +  $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
> +  $ cat hg.pid > $DAEMON_PIDS
> +
> +Server should send application/mercurial-0.1 to clients if no Accept is used
> +
> +  $ get-with-headers.py --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> +  200 Script output follows
> +  content-type: application/mercurial-0.1
> +  date: * (glob)
> +  server: * (glob)
> +  transfer-encoding: chunked
> +
> +Server should send application/mercurial-0.1 when client says it wants it
> +
> +  $ get-with-headers.py --hgproto '0.1' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> +  200 Script output follows
> +  content-type: application/mercurial-0.1
> +  date: * (glob)
> +  server: * (glob)
> +  transfer-encoding: chunked
> +
> +Server should send application/mercurial-0.2 when client says it wants it
> +
> +  $ get-with-headers.py --hgproto '0.2' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> +  200 Script output follows
> +  content-type: application/mercurial-0.2
> +  date: * (glob)
> +  server: * (glob)
> +  transfer-encoding: chunked
> +
> +  $ get-with-headers.py --hgproto '0.1 0.2' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> +  200 Script output follows
> +  content-type: application/mercurial-0.2
> +  date: * (glob)
> +  server: * (glob)
> +  transfer-encoding: chunked
> +
> +Requesting a compression format that server doesn't support results will fall back to 0.1
> +
> +  $ get-with-headers.py --hgproto '0.2 comp=aa' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> +  200 Script output follows
> +  content-type: application/mercurial-0.1
> +  date: * (glob)
> +  server: * (glob)
> +  transfer-encoding: chunked
> +
> +#if zstd
> +zstd is used if available
> +
> +  $ get-with-headers.py --hgproto '0.2 comp=zstd' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
> +  $ f --size --hexdump --bytes 36 --sha1 resp
> +  resp: size=248, sha1=4d8d8f87fb82bd542ce52881fdc94f850748
> +  0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> +  0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 7a 73 74 64 |t follows...zstd|
> +  0020: 28 b5 2f fd                                     |(./.|
> +
> +#endif
> +
> +application/mercurial-0.2 is not yet used on non-streaming responses
> +
> +  $ get-with-headers.py --hgproto '0.2' 127.0.0.1:$HGPORT '?cmd=heads' -
> +  200 Script output follows
> +  content-length: 41
> +  content-type: application/mercurial-0.1
> +  date: * (glob)
> +  server: * (glob)
> +
> +  e93700bd72895c5addab234c56d4024b487a362f
> +
> +Now test protocol preference usage
> +
> +  $ killdaemons.py
> +  $ hg --config server.compressionengines=none,zlib -R server serve -p $HGPORT -d --pid-file hg.pid
> +  $ cat hg.pid > $DAEMON_PIDS
> +
> +No Accept will send 0.1+zlib, even though "none" is preferred b/c "none" isn't supported on 0.1
> +
> +  $ get-with-headers.py --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' Content-Type
> +  200 Script output follows
> +  content-type: application/mercurial-0.1
> +
> +  $ get-with-headers.py 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000'  > resp
> +  $ f --size --hexdump --bytes 28 --sha1 resp
> +  resp: size=227, sha1=35a4c074da74f32f5440da3cbf04
> +  0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> +  0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 78             |t follows..x|
> +
> +Explicit 0.1 will send zlib because "none" isn't supported on 0.1
> +
> +  $ get-with-headers.py --hgproto '0.1' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000'  > resp
> +  $ f --size --hexdump --bytes 28 --sha1 resp
> +  resp: size=227, sha1=35a4c074da74f32f5440da3cbf04
> +  0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> +  0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 78             |t follows..x|
> +
> +0.2 with no compression will get "none" because that is server's preference
> +(spec says ZL and UN are implicitly supported)
> +
> +  $ get-with-headers.py --hgproto '0.2' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000'  > resp
> +  $ f --size --hexdump --bytes 32 --sha1 resp
> +  resp: size=432, sha1=ac931b412ec185a02e0e5bcff98dac83
> +  0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> +  0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 6e 6f 6e 65 |t follows...none|
> +
> +Client receives server preference even if local order doesn't match
> +
> +  $ get-with-headers.py --hgproto '0.2 comp=zlib,none' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000'  > resp
> +  $ f --size --hexdump --bytes 32 --sha1 resp
> +  resp: size=432, sha1=ac931b412ec185a02e0e5bcff98dac83
> +  0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> +  0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 6e 6f 6e 65 |t follows...none|
> +
> +Client receives only supported format even if not server preferred format
> +
> +  $ get-with-headers.py --hgproto '0.2 comp=zlib' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000'  > resp
> +  $ f --size --hexdump --bytes 33 --sha1 resp
> +  resp: size=232, sha1=a1c727f0c9693ca15742a75c30419bc36
> +  0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> +  0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 7a 6c 69 62 |t follows...zlib|
> +  0020: 78                                              |x|
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list