[PATCH 6 of 6 zstd-wireproto V3] protocol: send application/mercurial-0.2 responses to capable clients
Augie Fackler
raf at durin42.com
Tue Jan 10 15:25:17 UTC 2017
On Sat, Dec 24, 2016 at 03:33:10PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1482618572 25200
> # Sat Dec 24 15:29:32 2016 -0700
> # Node ID fee9a440173e103e96b46ad278e497636c4b6b53
> # Parent e78404d994d9da719af16647f6decaa8aeaa56b7
> protocol: send application/mercurial-0.2 responses to capable clients
I've got this tentatively queued - I'm going to make another pass over
it today before pushing it, because it's a lot to digest.
There are some coding style things that we should clean up as
follow-ups, but the review overhead of this series is high enough that
I'm planning to land it and then we can do followups for any style
issues that are problematic - I only want to bounce this for actual
correctness problems at this point.
>
> With this commit, the HTTP transport now parses the X-HgProto-<N>
> header to determine what media type and compression engine to use for
> responses. So far, we only compress responses that are already being
> compressed with zlib today (stream response types to specific
> commands). We can expand things to cover additional response types
> later.
>
> The practical side-effect of this commit is that non-zlib compression
> engines will be used if both ends support them. This means if both
> ends have zstd support, zstd - not zlib - will be used to compress
> data!
>
> When cloning the mozilla-unified repository between a local HTTP
> server and client, the benefits of non-zlib compression are quite
> noticeable:
>
> engine server CPU (s) client CPU (s) bundle size
> zlib (l=6) 174.1 283.2 1,148,547,026
> zstd (l=1) 99.2 267.3 1,127,513,841
> zstd (l=3) 103.1 266.9 1,018,861,363
> zstd (l=7) 128.3 269.7 919,190,278
> zstd (l=10) 162.0 - 894,547,179
> none 95.3 277.2 4,097,566,064
>
> The default zstd compression level is 3. So if you deploy zstd
> capable Mercurial to your clients and servers and CPU time on
> your server is dominated by "getbundle" requests (clients cloning
> and pulling) - and my experience at Mozilla tells me this is often
> the case - this commit could drastically reduce your server-side
> CPU usage *and* save on bandwidth costs!
>
> Another benefit of this change is that server operators can install
> *any* compression engine. While it isn't enabled by default, the
> "none" compression engine can now be used to disable wire protocol
> compression completely. Previously, commands like "getbundle" always
> zlib compressed output, adding considerable overhead to generating
> responses. If you are on a high speed network and your server is under
> high load, it might be advantageous to trade bandwidth for CPU.
> Although, zstd at level 1 doesn't use that much CPU, so I'm not
> convinced that disabling compression wholesale is worthwhile. And, my
> data seems to indicate a slow down on the client without compression.
> I suspect this is due to a lack of buffering resulting in an increase
> in socket read() calls and/or the fact we're transferring an extra 3 GB
> of data (parsing HTTP chunked transfer and processing extra TCP packets
> can add up). This is definitely worth investigating and optimizing. But
> since the "none" compressor isn't enabled by default, I'm inclined to
> punt on this issue.
>
> This commit introduces tons of tests. Some of these should arguably
> have been implemented on previous commits. But it was difficult to
> test without the server functionality in place.
>
> diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
> --- a/mercurial/help/config.txt
> +++ b/mercurial/help/config.txt
> @@ -1606,8 +1606,20 @@ Controls generic server settings.
> but sends more bytes to clients.
>
> This option only impacts the HTTP server.
>
> +``zstdlevel``
> + Integer between ``1`` and ``22`` that controls the zstd compression level
> + for wire protocol commands. ``1`` is the minimal amount of compression and
> + ``22`` is the highest amount of compression.
> +
> + The default (``3``) should be significantly faster than zlib while likely
> + delivering better compression ratios.
> +
> + This option only impacts the HTTP server.
> +
> + See also ``server.zliblevel``.
> +
> ``smtp``
> --------
>
> Configuration for extensions that need to send email messages.
> diff --git a/mercurial/hgweb/protocol.py b/mercurial/hgweb/protocol.py
> --- a/mercurial/hgweb/protocol.py
> +++ b/mercurial/hgweb/protocol.py
> @@ -7,8 +7,9 @@
>
> from __future__ import absolute_import
>
> import cgi
> +import struct
>
> from .common import (
> HTTP_OK,
> )
> @@ -22,8 +23,9 @@ stringio = util.stringio
> urlerr = util.urlerr
> urlreq = util.urlreq
>
> HGTYPE = 'application/mercurial-0.1'
> +HGTYPE2 = 'application/mercurial-0.2'
> HGERRTYPE = 'application/hg-error'
>
> def decodevaluefromheaders(req, headerprefix):
> """Decode a long value from multiple HTTP request headers."""
> @@ -82,26 +84,82 @@ class webproto(wireproto.abstractserverp
> val = self.ui.fout.getvalue()
> self.ui.ferr, self.ui.fout = self.oldio
> return val
>
> - def compresschunks(self, chunks):
> - # Don't allow untrusted settings because disabling compression or
> - # setting a very high compression level could lead to flooding
> - # the server's network or CPU.
> - opts = {'level': self.ui.configint('server', 'zliblevel', -1)}
> - return util.compengines['zlib'].compressstream(chunks, opts)
> -
> def _client(self):
> return 'remote:%s:%s:%s' % (
> self.req.env.get('wsgi.url_scheme') or 'http',
> urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
> urlreq.quote(self.req.env.get('REMOTE_USER', '')))
>
> + def responsetype(self, v1compressible=False):
> + """Determine the appropriate response type and compression settings.
> +
> + The ``v1compressible`` argument states whether the response with
> + application/mercurial-0.1 media types should be zlib compressed.
> +
> + Returns a tuple of (mediatype, compengine, engineopts).
> + """
> + # For now, if it isn't compressible in the old world, it's never
> + # compressible. We can change this to send uncompressed 0.2 payloads
> + # later.
> + if not v1compressible:
> + return HGTYPE, None, None
> +
> + # Determine the response media type and compression engine based
> + # on the request parameters.
> + protocaps = decodevaluefromheaders(self.req, 'X-HgProto').split(' ')
> +
> + if '0.2' in protocaps:
> + # Default as defined by wire protocol spec.
> + compformats = ['zlib', 'none']
> + for cap in protocaps:
> + if cap.startswith('comp='):
> + compformats = cap[5:].split(',')
> + break
> +
> + # Now find an agreed upon compression format.
> + for engine in wireproto.supportedcompengines(self.ui, self,
> + util.SERVERROLE):
> + if engine.wireprotosupport().name in compformats:
> + opts = {}
> + level = self.ui.configint('server',
> + '%slevel' % engine.name())
> + if level is not None:
> + opts['level'] = level
> +
> + return HGTYPE2, engine, opts
> +
> + # No mutually supported compression format. Fall back to the
> + # legacy protocol.
> +
> + # Don't allow untrusted settings because disabling compression or
> + # setting a very high compression level could lead to flooding
> + # the server's network or CPU.
> + opts = {'level': self.ui.configint('server', 'zliblevel', -1)}
> + return HGTYPE, util.compengines['zlib'], opts
> +
> def iscmd(cmd):
> return cmd in wireproto.commands
>
> def call(repo, req, cmd):
> p = webproto(req, repo.ui)
> +
> + def genversion2(gen, compress, engine, engineopts):
> + # application/mercurial-0.2 always sends a payload header
> + # identifying the compression engine.
> + name = engine.wireprotosupport().name
> + assert 0 < len(name) < 256
> + yield struct.pack('B', len(name))
> + yield name
> +
> + if compress:
> + for chunk in engine.compressstream(gen, opts=engineopts):
> + yield chunk
> + else:
> + for chunk in gen:
> + yield chunk
> +
> rsp = wireproto.dispatch(repo, p, cmd)
> if isinstance(rsp, str):
> req.respond(HTTP_OK, HGTYPE, body=rsp)
> return []
> @@ -110,12 +168,18 @@ def call(repo, req, cmd):
> gen = iter(lambda: rsp.reader.read(32768), '')
> else:
> gen = rsp.gen
>
> - if rsp.v1compressible:
> - gen = p.compresschunks(gen)
> + # This code for compression should not be streamres specific. It
> + # is here because we only compress streamres at the moment.
> + mediatype, engine, engineopts = p.responsetype(rsp.v1compressible)
>
> - req.respond(HTTP_OK, HGTYPE)
> + if mediatype == HGTYPE and rsp.v1compressible:
> + gen = engine.compressstream(gen, engineopts)
> + elif mediatype == HGTYPE2:
> + gen = genversion2(gen, rsp.v1compressible, engine, engineopts)
> +
> + req.respond(HTTP_OK, mediatype)
> return gen
> elif isinstance(rsp, wireproto.pushres):
> val = p.restore()
> rsp = '%d\n%s' % (rsp.res, val)
> diff --git a/tests/get-with-headers.py b/tests/get-with-headers.py
> --- a/tests/get-with-headers.py
> +++ b/tests/get-with-headers.py
> @@ -34,15 +34,24 @@ formatjson = False
> if '--json' in sys.argv:
> sys.argv.remove('--json')
> formatjson = True
>
> +hgproto = None
> +if '--hgproto' in sys.argv:
> + idx = sys.argv.index('--hgproto')
> + hgproto = sys.argv[idx + 1]
> + sys.argv.pop(idx)
> + sys.argv.pop(idx)
> +
> tag = None
> def request(host, path, show):
> assert not path.startswith('/'), path
> global tag
> headers = {}
> if tag:
> headers['If-None-Match'] = tag
> + if hgproto:
> + headers['X-HgProto-1'] = hgproto
>
> conn = httplib.HTTPConnection(host)
> conn.request("GET", '/' + path, None, headers)
> response = conn.getresponse()
> diff --git a/tests/test-http-protocol.t b/tests/test-http-protocol.t
> --- a/tests/test-http-protocol.t
> +++ b/tests/test-http-protocol.t
> @@ -41,4 +41,127 @@ Order of engines can also change
> $ get-with-headers.py 127.0.0.1:$HGPORT '?cmd=capabilities' | tr ' ' '\n' | grep compression
> compression=none,zlib
>
> $ killdaemons.py
> +
> +Start a default server again
> +
> + $ hg -R server serve -p $HGPORT -d --pid-file hg.pid
> + $ cat hg.pid > $DAEMON_PIDS
> +
> +Server should send application/mercurial-0.1 to clients if no Accept is used
> +
> + $ get-with-headers.py --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> + 200 Script output follows
> + content-type: application/mercurial-0.1
> + date: * (glob)
> + server: * (glob)
> + transfer-encoding: chunked
> +
> +Server should send application/mercurial-0.1 when client says it wants it
> +
> + $ get-with-headers.py --hgproto '0.1' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> + 200 Script output follows
> + content-type: application/mercurial-0.1
> + date: * (glob)
> + server: * (glob)
> + transfer-encoding: chunked
> +
> +Server should send application/mercurial-0.2 when client says it wants it
> +
> + $ get-with-headers.py --hgproto '0.2' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> + 200 Script output follows
> + content-type: application/mercurial-0.2
> + date: * (glob)
> + server: * (glob)
> + transfer-encoding: chunked
> +
> + $ get-with-headers.py --hgproto '0.1 0.2' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> + 200 Script output follows
> + content-type: application/mercurial-0.2
> + date: * (glob)
> + server: * (glob)
> + transfer-encoding: chunked
> +
> +Requesting a compression format that server doesn't support results will fall back to 0.1
> +
> + $ get-with-headers.py --hgproto '0.2 comp=aa' --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' -
> + 200 Script output follows
> + content-type: application/mercurial-0.1
> + date: * (glob)
> + server: * (glob)
> + transfer-encoding: chunked
> +
> +#if zstd
> +zstd is used if available
> +
> + $ get-with-headers.py --hgproto '0.2 comp=zstd' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
> + $ f --size --hexdump --bytes 36 --sha1 resp
> + resp: size=248, sha1=4d8d8f87fb82bd542ce52881fdc94f850748
> + 0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> + 0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 7a 73 74 64 |t follows...zstd|
> + 0020: 28 b5 2f fd |(./.|
> +
> +#endif
> +
> +application/mercurial-0.2 is not yet used on non-streaming responses
> +
> + $ get-with-headers.py --hgproto '0.2' 127.0.0.1:$HGPORT '?cmd=heads' -
> + 200 Script output follows
> + content-length: 41
> + content-type: application/mercurial-0.1
> + date: * (glob)
> + server: * (glob)
> +
> + e93700bd72895c5addab234c56d4024b487a362f
> +
> +Now test protocol preference usage
> +
> + $ killdaemons.py
> + $ hg --config server.compressionengines=none,zlib -R server serve -p $HGPORT -d --pid-file hg.pid
> + $ cat hg.pid > $DAEMON_PIDS
> +
> +No Accept will send 0.1+zlib, even though "none" is preferred b/c "none" isn't supported on 0.1
> +
> + $ get-with-headers.py --headeronly 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' Content-Type
> + 200 Script output follows
> + content-type: application/mercurial-0.1
> +
> + $ get-with-headers.py 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
> + $ f --size --hexdump --bytes 28 --sha1 resp
> + resp: size=227, sha1=35a4c074da74f32f5440da3cbf04
> + 0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> + 0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 78 |t follows..x|
> +
> +Explicit 0.1 will send zlib because "none" isn't supported on 0.1
> +
> + $ get-with-headers.py --hgproto '0.1' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
> + $ f --size --hexdump --bytes 28 --sha1 resp
> + resp: size=227, sha1=35a4c074da74f32f5440da3cbf04
> + 0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> + 0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 78 |t follows..x|
> +
> +0.2 with no compression will get "none" because that is server's preference
> +(spec says ZL and UN are implicitly supported)
> +
> + $ get-with-headers.py --hgproto '0.2' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
> + $ f --size --hexdump --bytes 32 --sha1 resp
> + resp: size=432, sha1=ac931b412ec185a02e0e5bcff98dac83
> + 0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> + 0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 6e 6f 6e 65 |t follows...none|
> +
> +Client receives server preference even if local order doesn't match
> +
> + $ get-with-headers.py --hgproto '0.2 comp=zlib,none' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
> + $ f --size --hexdump --bytes 32 --sha1 resp
> + resp: size=432, sha1=ac931b412ec185a02e0e5bcff98dac83
> + 0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> + 0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 6e 6f 6e 65 |t follows...none|
> +
> +Client receives only supported format even if not server preferred format
> +
> + $ get-with-headers.py --hgproto '0.2 comp=zlib' 127.0.0.1:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
> + $ f --size --hexdump --bytes 33 --sha1 resp
> + resp: size=232, sha1=a1c727f0c9693ca15742a75c30419bc36
> + 0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
> + 0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 7a 6c 69 62 |t follows...zlib|
> + 0020: 78 |x|
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list