[PATCH] hgweb: tweak zlib chunking behavior

Tue Aug 16 10:14:11 EDT 2016

On Sun, Aug 14, 2016 at 09:31:58PM -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1471235386 25200
> #      Sun Aug 14 21:29:46 2016 -0700
> # Node ID 9428cb5b1bb55c2e72f0713cc6e2536cc3de0292
> # Parent  279cd80059d41bbdb91ea9073278cbbe7f1b43d5
> hgweb: tweak zlib chunking behavior

Nice, queued this.

>
> When doing streaming compression with zlib, zlib appears to emit chunks
> with data after ~20-30kb on average is available. In other words, most
> calls to compress() return an empty string. On the mozilla-unified repo,
> only 48,433 of 921,167 (5.26%) of calls to compress() returned data.
> In other words, we were sending hundreds of thousands of empty chunks
> via a generator where they touched who knows how many frames (my guess
> is millions). Filtering out the empty chunks from the generator
> cuts down on overhead.
>
> In addition, we were previously feeding 8kb chunks into zlib
> compression. Since this function tends to emit *compressed* data after
> 20-30kb is available, it would take several calls before data was
> produced. We increase the amount of data fed in at a time to 32kb.
> This reduces the number of calls to compress() from 921,167 to
> 115,146. It also reduces the number of output chunks from 48,433 to
> 31,377. This does increase the average output chunk size by a little.
> But I don't think this will matter in most scenarios.
>
> The combination of these 2 changes appears to shave ~6s CPU time
> or ~3% from a server serving the mozilla-unified repo.
>
> diff --git a/mercurial/hgweb/protocol.py b/mercurial/hgweb/protocol.py
> --- a/mercurial/hgweb/protocol.py
> +++ b/mercurial/hgweb/protocol.py
> @@ -71,20 +71,24 @@ class webproto(wireproto.abstractserverp
>          self.ui.ferr = self.ui.fout = stringio()
>      def restore(self):
>          val = self.ui.fout.getvalue()
>          self.ui.ferr, self.ui.fout = self.oldio
>          return val
>      def groupchunks(self, cg):
>          z = zlib.compressobj(self.ui.configint('server', 'zliblevel', -1))
>          while True:
> -            chunk = cg.read(4096)
> +            chunk = cg.read(32768)
>              if not chunk:
>                  break
> -            yield z.compress(chunk)
> +            data = z.compress(chunk)
> +            # Not all calls to compress() emit data. It is cheaper to inspect
> +            # that here than to send it via the generator.
> +            if data:
> +                yield data
>          yield z.flush()
>      def _client(self):
>          return 'remote:%s:%s:%s' % (
>              self.req.env.get('wsgi.url_scheme') or 'http',
>              urlreq.quote(self.req.env.get('REMOTE_HOST', '')),
>              urlreq.quote(self.req.env.get('REMOTE_USER', '')))
>
>  def iscmd(cmd):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel