[PATCH] hgweb: tweak zlib chunking behavior
Augie Fackler
raf at durin42.com
Tue Aug 16 10:14:11 EDT 2016
On Sun, Aug 14, 2016 at 09:31:58PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1471235386 25200
> # Sun Aug 14 21:29:46 2016 -0700
> # Node ID 9428cb5b1bb55c2e72f0713cc6e2536cc3de0292
> # Parent 279cd80059d41bbdb91ea9073278cbbe7f1b43d5
> hgweb: tweak zlib chunking behavior
Nice, queued this.
>
> When doing streaming compression with zlib, zlib appears to emit chunks
> with data after ~20-30kb on average is available. In other words, most
> calls to compress() return an empty string. On the mozilla-unified repo,
> only 48,433 of 921,167 (5.26%) of calls to compress() returned data.
> In other words, we were sending hundreds of thousands of empty chunks
> via a generator where they touched who knows how many frames (my guess
> is millions). Filtering out the empty chunks from the generator
> cuts down on overhead.
>
> In addition, we were previously feeding 8kb chunks into zlib
> compression. Since this function tends to emit *compressed* data after
> 20-30kb is available, it would take several calls before data was
> produced. We increase the amount of data fed in at a time to 32kb.
> This reduces the number of calls to compress() from 921,167 to
> 115,146. It also reduces the number of output chunks from 48,433 to
> 31,377. This does increase the average output chunk size by a little.
> But I don't think this will matter in most scenarios.
>
> The combination of these 2 changes appears to shave ~6s CPU time
> or ~3% from a server serving the mozilla-unified repo.
>
> diff --git a/mercurial/hgweb/protocol.py b/mercurial/hgweb/protocol.py
> --- a/mercurial/hgweb/protocol.py
> +++ b/mercurial/hgweb/protocol.py
> @@ -71,20 +71,24 @@ class webproto(wireproto.abstractserverp
> self.ui.ferr = self.ui.fout = stringio()
> def restore(self):
> val = self.ui.fout.getvalue()
> self.ui.ferr, self.ui.fout = self.oldio
> return val
> def groupchunks(self, cg):
> z = zlib.compressobj(self.ui.configint('server', 'zliblevel', -1))
> while True:
> - chunk = cg.read(4096)
> + chunk = cg.read(32768)
> if not chunk:
> break
> - yield z.compress(chunk)
> + data = z.compress(chunk)
> + # Not all calls to compress() emit data. It is cheaper to inspect
> + # that here than to send it via the generator.
> + if data:
> + yield data
> yield z.flush()
> def _client(self):
> return 'remote:%s:%s:%s' % (
> self.req.env.get('wsgi.url_scheme') or 'http',
> urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
> urlreq.quote(self.req.env.get('REMOTE_USER', '')))
>
> def iscmd(cmd):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list