[PATCH 2 of 2 RFC] revlog: establish a cache of decompressed revision chunks

Gregory Szorc gregory.szorc at gmail.com
Mon Nov 23 00:59:36 CST 2015


On Sun, Nov 22, 2015 at 10:28 PM, Gregory Szorc <gregory.szorc at gmail.com>
wrote:

> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1448257271 28800
> #      Sun Nov 22 21:41:11 2015 -0800
> # Node ID 5ed02a9105be48e7a595bb8e6168dceccdad7a50
> # Parent  5d0aff0739984ebdcc5c99a84d338749cd702119
> revlog: establish a cache of decompressed revision chunks
>
> (THIS PATCH IS NOT APPROPRIATE FOR LANDING.)
>
> Profiling reveals that we spend a lot of time doing zlib decompression
> on revlogs. Furthermore, while we cache the raw, compressed revlog
> entries, we don't cache the decompressed output.
>
> This patch introduces a LRU cache for the raw content of decompressed
> revisions. It isn't suitable for checkin because there are no settings
> to tune it. The default settings for manifest caching will likely
> consume way too much memory on insanely large repositories. Also, we
> should investigate caching multiple fulltext entries instead of the
> bdiffs, as bdiff processing is not cheap.
>
> This patch has a profound impact on performance.
>
> On mozilla-central (non-gd):
>
> `hg log -r 'first(reverse(::tip), 100)' -p`
> before: 30.93s
> after:   7.57s
> delta: -23.36s (24.5% of original)
>

For the curious, this command executes in 4.90s when the repository is
converted to lz4. At that point statprof says SHA-1 hashing, manifest
diffing, diff generation, and revlog reading are where the remaining time
is spent. Color me surprised that SHA-1 calculation is taking that long.
But if you do the math, we're passing 1,367,257,372 bytes into this
function. So, uh, 1+ GB/s sounds reasonable, especially for Python. Perhaps
I'll need to put an LRU cache on SHA-1 hashing :)

  %   cumulative      self
 time    seconds   seconds  name
 20.07      1.01      1.01  revlog.py:79:hash
 17.68      0.89      0.89  revlog.py:1213:revision
 13.08      0.66      0.66  manifest.py:293:diff
  5.89      0.30      0.30  mdiff.py:121:allblocks
  4.42      0.22      0.22  mdiff.py:14:splitnewlines
  3.68      0.18      0.18  manifest.py:312:copy
  2.76      0.70      0.14  revlog.py:1084:grabchunks
  2.39      0.12      0.12  lz4revlog.py:82:decompress
  1.84      0.09      0.09  util.py:500:__getitem__
  1.29      0.06      0.06  util.py:553:_recordaccess
  1.29      0.06      0.06  revlog.py:1196:revision
  1.29      0.06      0.06  util.py:552:_recordaccess
  1.29      0.06      0.06  util.py:555:_recordaccess
  1.29      0.06      0.06  manifest.py:968:read
  1.10      0.06      0.06  util.py:554:_recordaccess



>
> `hg perfmanifest 51fa3e0d4f7b`
> before: wall 0.287174 comb 0.290000 user 0.290000 sys 0.000000 (best of 35)
> after:  wall 0.046735 comb 0.050000 user 0.050000 sys 0.000000 (best of
> 100)
> delta:      -0.240439 (16.3% of original)
> (this test is cheating because it just accesses the manifest revision over
> and
> over)
>
> `hg log -r 0:tip`
> before: 22.48s
> after:  21.00s
> delta:  -1.48s (93.4% of original)
>
> On a generaldelta conversion of mozilla-central:
>
> `hg log -r 'first(reverse(::tip), 100)' -p`
> before: 42.13s
> after:  14.02s
> delta: -28.11s (33.3% or original)
>
> `hg perfmanifest 51fa3e0d4f7b`
> before: wall 0.400207 comb 0.400000 user 0.390000 sys 0.010000 (best of 25)
> after:  wall 0.107959 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)
>
> The reason generaldelta is slower is because the delta chains for this
> repo are very long. Max chain is in the 50,000 range! This is because I
> used format.aggressivemergedeltas and that mode is insane at keeping
> deltas small. These long delta chains also expose another weakness in this
> patch: tail chasing cache eviction. If our delta chains are longer than
> the cache size, we won't get cache hits, even if the cache originally
> contained items relevant to us. We may want to perform an additional
> walk to find all cache entries before potentially evicting them.
>
> Other things that need addressed before this lands:
>
> * Convert revchunk cache to lrucachedict? We currently storing buffers
>   and returning views into them then slicing that further. Not sure why
>   this is done.
> * Kill the revchunk cache? Seems pretty worthless if we're caching its
>   decompressed results as well.
> * Cache resolved fulltexts instead?
> * Make cache settings configurable.
> * Experiment with different default settings.
> * Make cache size limited by size of content not number of elements.
> * Investigate actual memory consumption.
> * Lazy initialize nodes in lrucachedict (don't preallocate because this
>   is overhead).
> * Investigate potential over chunk fetching in revlog.revision()
>   (looking at raw requests it appears we're dropping base fulltexts
>   more than we should)
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -862,16 +862,18 @@ class treemanifest(object):
>          for d, subm in self._dirs.iteritems():
>              subp1 = m1._dirs.get(d, emptytree)._node
>              subp2 = m2._dirs.get(d, emptytree)._node
>              if subp1 == revlog.nullid:
>                  subp1, subp2 = subp2, subp1
>              writesubtree(subm, subp1, subp2)
>
>  class manifest(revlog.revlog):
> +    DECOMPRESSEDCACHESIZE = 20000
> +
>      def __init__(self, opener, dir='', dirlogcache=None):
>          '''The 'dir' and 'dirlogcache' arguments are for internal use by
>          manifest.manifest only. External users should create a root
> manifest
>          log with manifest.manifest(opener) and call dirlog() on it.
>          '''
>          # During normal operations, we expect to deal with not more than
> four
>          # revs at a time (such as during commit --amend). When rebasing
> large
>          # stacks of commits, the number can go up, hence the config knob
> below.
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -190,16 +190,18 @@ class revlog(object):
>      version data. This makes retrieval of a version proportional to
>      its size, or O(1) relative to the number of revisions.
>
>      Both pieces of the revlog are written to in an append-only
>      fashion, which means we never need to rewrite a file to insert or
>      remove data, and can use some simple techniques to avoid the need
>      for locking while reading.
>      """
> +    DECOMPRESSEDCACHESIZE = 10
> +
>      def __init__(self, opener, indexfile):
>          """
>          create a revlog object
>
>          opener is a function that abstracts the file opening operation
>          and can be used to implement COW semantics or the like.
>          """
>          self.indexfile = indexfile
> @@ -209,16 +211,17 @@ class revlog(object):
>          self._cache = None
>          # 2-tuple of (rev, baserev) defining the base revision the delta
> chain
>          # begins at for a revision.
>          self._basecache = None
>          # 2-tuple of (offset, data) of raw data from the revlog at an
> offset.
>          self._chunkcache = (0, '')
>          # How much data to read and cache into the raw revlog data cache.
>          self._chunkcachesize = 65536
> +        self._decompressedcache =
> util.lrucachedict(self.DECOMPRESSEDCACHESIZE)
>          self._maxchainlen = None
>          self._aggressivemergedeltas = False
>          self.index = []
>          # Mapping of partial identifiers to full nodes.
>          self._pcache = {}
>          # Mapping of revision integer to full node.
>          self._nodecache = {nullid: nullrev}
>          self._nodepos = None
> @@ -1055,38 +1058,47 @@ class revlog(object):
>          """
>          if not revs:
>              return []
>          start = self.start
>          length = self.length
>          inline = self._inline
>          iosize = self._io.size
>          buffer = util.buffer
> +        decompressedcache = self._decompressedcache
>
>          l = []
>          ladd = l.append
>
>          # Historically, the list of revs for a delta chain has been
> contiguous,
>          # equivalent to ``range(revs[0], revs[-1])``. With generaldelta,
> the
>          # chain can have holes. If we performed a single request for all
> revs
>          # between base and tip, we could pull in data for several
> unrelated
>          # revisions.
>          #
>          # Our strategy is to walk the revisions and obtain only the data
> for
>          # revisions we care about.
>          def grabchunks(startrev, endrev):
>              try:
>                  offset, data = self._chunkraw(startrev, endrev, df=df)
>                  for rev in range(startrev, endrev + 1):
> +                    try:
> +                        ladd(decompressedcache[rev])
> +                        continue
> +                    except KeyError:
> +                        pass
> +
>                      chunkstart = start(rev)
>                      if inline:
>                          chunkstart += (rev + 1) * iosize
>                      chunklength = length(rev)
>                      revchunk = buffer(data, chunkstart - offset,
> chunklength)
> -                    ladd(decompress(revchunk))
> +                    decompressed = decompress(revchunk)
> +                    decompressedcache[rev] = decompressed
> +                    ladd(decompressed)
>              except OverflowError:
>                  # issue4215 - we can't cache a run of chunks greater than
>                  # 2G on Windows.
>                  for rev in range(startrev, endrev + 1):
>                      ladd(self._chunk(rev, df=df))
>
>          maxindex = len(revs) - 1
>          lastrev = revs[0] - 1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151122/11b7a21c/attachment.html>


More information about the Mercurial-devel mailing list