[PATCH 7 of 8 zstd-revlogs] revlog: use compression engine APIs for decompression

Pierre-Yves David pierre-yves.david at ens-lyon.org
Fri Jan 13 09:38:26 EST 2017



On 01/03/2017 12:57 AM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1483395249 28800
> #      Mon Jan 02 14:14:09 2017 -0800
> # Node ID 96837453d041fe7289dd3e6450aee4961517c5b5
> # Parent  da38ab97f1f4d312966431a16d1f2acb90b75242
> revlog: use compression engine APIs for decompression
>
> Now that compression engines declare their header in revlog chunks
> and can decompress revlog chunks, we refactor revlog.decompress()
> to use them.
>
> Making full use of the property that revlog compressor objects are
> reusable, revlog instances now maintain a dict mapping an engine's
> revlog header to a compressor object. This is not only a performance
> optimization for engines where compressor object reuse can result in
> better performance, but it also serves as a cache of header values
> so we don't need to perform redundant lookups against the compression
> engine manager. (Yes, I measured and the overhead of a function call
> versus a dict lookup was observed.)
>
> This change effectively replaces manual, inline lookup code with a
> dict lookup plus extra function call where compression is actually
> used. On the mozilla-unified repository:
>
> $ hg perfrevlogchunks -c
> ! chunk
> ! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6)
> ! wall 2.003241 comb 2.000000 user 1.960000 sys 0.040000 (best of 5)
> ! chunk batch
> ! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)
> ! wall 1.866939 comb 1.850000 user 1.840000 sys 0.010000 (best of 6)
>
> $ hg perfrevlogchunks -m
> ! chunk
> ! wall 2.592354 comb 2.600000 user 2.560000 sys 0.040000 (best of 4)
> ! wall 2.733626 comb 2.730000 user 2.670000 sys 0.060000 (best of 4)
> ! chunk batch
> ! wall 2.469384 comb 2.480000 user 2.420000 sys 0.060000 (best of 4)
> ! wall 2.580613 comb 2.590000 user 2.550000 sys 0.040000 (best of 4)
>
> We can see this change has a negative impact on performance.
>
> There are 360,210 changelog entries (359,658 or 99.85% zlib compressed
> and 552 with the 'u' header). The mean time per decompress() in
> non-batch mode increased from 5.428us to 5.561us, a difference of
> 0.13us or 2.5%.
>
> There are 359,342 manifest entries (242,566 or 67.50% zlib compressed,
> 116,602 or 32.45% with '\0' header, and 174 or 0.05% that are empty or
> have the 'u' header). The mean time per decompress() in non-batch mode
> increased from 7.214us to 7.607us, a difference of 0.393us or 5.4%.
>
> If we reinsert inline code for "if t == 'x': return decompress(data)",
> performance remains unchanged or even becomes faster than before
> (if we avoid the checks for '\0' and 'u' before 'x'). My conclusion is
> that attribute lookups and function calls in Python can matter for
> performance. We've known this to be true, which is why tight loops
> leverage local variables to avoid attribute lookups. But I'll be honest:
> I was surprised to see this come into play for decompression: I would
> have expected time spent in the decompressor to dwarf Python time.
>
> Anyway, `hg perfrevlogchunks` is a microbenchmark. It's a good tool
> to measure raw performance, but it may not be indicative of real world
> behavior. For example, the performance loss on manifests as a
> percentage is troubling. But very few operations need to decompress
> every manifest chunk. And when they do, they typically have to perform
> other CPU intensive behavior like applying the diff. So the performance
> loss on manifests doesn't concern me as much.
>
> What about changelogs? Again, there are only so many operations that
> need to read every changelog revision. Fortunately, we have other
> perf commands to measure the impact on these operations:
>
> $ hg perfrevlogrevision -c -d 1
> ! wall 4.624776 comb 4.620000 user 4.600000 sys 0.020000 (best of 3)
> ! wall 4.711967 comb 4.710000 user 4.700000 sys 0.010000 (best of 3)
>
> $ hg perfrevset 'author(mpm)'
> ! wall 9.611942 comb 9.600000 user 9.570000 sys 0.030000 (best of 3)
> ! wall 9.915843 comb 9.900000 user 9.860000 sys 0.040000 (best of 3)
>
> $ hg perfrevset 'author(mpm)' --contexts
> ! wall 9.930018 comb 9.930000 user 9.900000 sys 0.030000 (best of 3)
> ! wall 10.007671 comb 10.000000 user 9.970000 sys 0.030000 (best of 3)
>
> Again, we see performance regressions.
>
> I feel bad about introducing a performance regression. But, I can
> argue that's the price to pay for an abstraction. I suppose we could
> restore the inline code for zlib (and presumably zstd someday) so we
> don't go through the compression engine API. We could have "tier 1"
> compression formats where we support fast/inline operations while still
> allowing extensions to implement their own, fully capable compression
> engines.

The regression percentage are noticeable enough (I've seen patch for 
less) that I think hardcoding the known common case make sense to avoid 
the regression.
That is not super nice but that provide a good way to move forward (in 
that great direction) without


>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -18,7 +18,6 @@ import errno
>  import hashlib
>  import os
>  import struct
> -import zlib
>
>  # import stuff from node for others to import from revlog
>  from .node import (
> @@ -39,7 +38,6 @@ from . import (
>
>  _pack = struct.pack
>  _unpack = struct.unpack
> -_decompress = zlib.decompress
>
>  # revlog header flags
>  REVLOGV0 = 0
> @@ -298,6 +296,8 @@ class revlog(object):
>              self._chunkclear()
>          # revnum -> (chain-length, sum-delta-length)
>          self._chaininfocache = {}
> +        # revlog header -> revlog compressor
> +        self._decompressors = {}
>
>      @util.propertycache
>      def _compressor(self):
> @@ -1374,17 +1374,43 @@ class revlog(object):
>          """
>          if not data:
>              return data
> +
> +        # Revlogs are read much more frequently than they are written, so
> +        # performance of this function is important.
> +        #
> +        # We can make a few assumptions about revlogs:
> +        #
> +        # 1) the majority of chunks will be compressed (as opposed to inline
> +        #    raw data).
> +        # 2) decompressing *any* data will likely by at least 10x slower than
> +        #    returning raw inline data.
> +        #
> +        # It follows that we want to optimize the "decompress compressed data"
> +        # case over "return inline data." The code below does this by putting
> +        # the self._decompressors lookup first and *then* checks for our
> +        # special cases of the ``\0`` and ``u`` headers.
> +        #
> +        # For cases where data is actually compressed, this avoids comparison
> +        # against the "null" decompressors and allows us to dispatch to the
> +        # decompressor quicker. According to `hg perfrevlogchunks`, this is
> +        # ~0.5% faster.
>          t = data[0]
> -        if t == '\0':
> -            return data
> -        if t == 'x':
> +        try:
> +            compressor = self._decompressors[t]
> +        except KeyError:
> +            if t == '\0':
> +                return data
> +            elif t == 'u':
> +                return util.buffer(data, 1)
> +
>              try:
> -                return _decompress(data)
> -            except zlib.error as e:
> -                raise RevlogError(_('revlog decompress error: %s') % str(e))
> -        if t == 'u':
> -            return util.buffer(data, 1)
> -        raise RevlogError(_('unknown compression type %r') % t)
> +                engine = util.compengines.forrevlogheader(t)
> +                compressor = engine.revlogcompressor()
> +                self._decompressors[t] = compressor
> +            except KeyError:
> +                raise RevlogError(_('unknown compression type %r') % t)
> +
> +        return compressor.decompress(data)
>
>      def _isgooddelta(self, d, textlen):
>          """Returns True if the given delta is good. Good means that it is within
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list