[PATCH 2 of 2 RFC] revlog: establish a cache of decompressed revision chunks
Gregory Szorc
gregory.szorc at gmail.com
Mon Nov 23 00:59:36 CST 2015
On Sun, Nov 22, 2015 at 10:28 PM, Gregory Szorc <gregory.szorc at gmail.com>
wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1448257271 28800
> # Sun Nov 22 21:41:11 2015 -0800
> # Node ID 5ed02a9105be48e7a595bb8e6168dceccdad7a50
> # Parent 5d0aff0739984ebdcc5c99a84d338749cd702119
> revlog: establish a cache of decompressed revision chunks
>
> (THIS PATCH IS NOT APPROPRIATE FOR LANDING.)
>
> Profiling reveals that we spend a lot of time doing zlib decompression
> on revlogs. Furthermore, while we cache the raw, compressed revlog
> entries, we don't cache the decompressed output.
>
> This patch introduces a LRU cache for the raw content of decompressed
> revisions. It isn't suitable for checkin because there are no settings
> to tune it. The default settings for manifest caching will likely
> consume way too much memory on insanely large repositories. Also, we
> should investigate caching multiple fulltext entries instead of the
> bdiffs, as bdiff processing is not cheap.
>
> This patch has a profound impact on performance.
>
> On mozilla-central (non-gd):
>
> `hg log -r 'first(reverse(::tip), 100)' -p`
> before: 30.93s
> after: 7.57s
> delta: -23.36s (24.5% of original)
>
For the curious, this command executes in 4.90s when the repository is
converted to lz4. At that point statprof says SHA-1 hashing, manifest
diffing, diff generation, and revlog reading are where the remaining time
is spent. Color me surprised that SHA-1 calculation is taking that long.
But if you do the math, we're passing 1,367,257,372 bytes into this
function. So, uh, 1+ GB/s sounds reasonable, especially for Python. Perhaps
I'll need to put an LRU cache on SHA-1 hashing :)
% cumulative self
time seconds seconds name
20.07 1.01 1.01 revlog.py:79:hash
17.68 0.89 0.89 revlog.py:1213:revision
13.08 0.66 0.66 manifest.py:293:diff
5.89 0.30 0.30 mdiff.py:121:allblocks
4.42 0.22 0.22 mdiff.py:14:splitnewlines
3.68 0.18 0.18 manifest.py:312:copy
2.76 0.70 0.14 revlog.py:1084:grabchunks
2.39 0.12 0.12 lz4revlog.py:82:decompress
1.84 0.09 0.09 util.py:500:__getitem__
1.29 0.06 0.06 util.py:553:_recordaccess
1.29 0.06 0.06 revlog.py:1196:revision
1.29 0.06 0.06 util.py:552:_recordaccess
1.29 0.06 0.06 util.py:555:_recordaccess
1.29 0.06 0.06 manifest.py:968:read
1.10 0.06 0.06 util.py:554:_recordaccess
>
> `hg perfmanifest 51fa3e0d4f7b`
> before: wall 0.287174 comb 0.290000 user 0.290000 sys 0.000000 (best of 35)
> after: wall 0.046735 comb 0.050000 user 0.050000 sys 0.000000 (best of
> 100)
> delta: -0.240439 (16.3% of original)
> (this test is cheating because it just accesses the manifest revision over
> and
> over)
>
> `hg log -r 0:tip`
> before: 22.48s
> after: 21.00s
> delta: -1.48s (93.4% of original)
>
> On a generaldelta conversion of mozilla-central:
>
> `hg log -r 'first(reverse(::tip), 100)' -p`
> before: 42.13s
> after: 14.02s
> delta: -28.11s (33.3% or original)
>
> `hg perfmanifest 51fa3e0d4f7b`
> before: wall 0.400207 comb 0.400000 user 0.390000 sys 0.010000 (best of 25)
> after: wall 0.107959 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)
>
> The reason generaldelta is slower is because the delta chains for this
> repo are very long. Max chain is in the 50,000 range! This is because I
> used format.aggressivemergedeltas and that mode is insane at keeping
> deltas small. These long delta chains also expose another weakness in this
> patch: tail chasing cache eviction. If our delta chains are longer than
> the cache size, we won't get cache hits, even if the cache originally
> contained items relevant to us. We may want to perform an additional
> walk to find all cache entries before potentially evicting them.
>
> Other things that need addressed before this lands:
>
> * Convert revchunk cache to lrucachedict? We currently storing buffers
> and returning views into them then slicing that further. Not sure why
> this is done.
> * Kill the revchunk cache? Seems pretty worthless if we're caching its
> decompressed results as well.
> * Cache resolved fulltexts instead?
> * Make cache settings configurable.
> * Experiment with different default settings.
> * Make cache size limited by size of content not number of elements.
> * Investigate actual memory consumption.
> * Lazy initialize nodes in lrucachedict (don't preallocate because this
> is overhead).
> * Investigate potential over chunk fetching in revlog.revision()
> (looking at raw requests it appears we're dropping base fulltexts
> more than we should)
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -862,16 +862,18 @@ class treemanifest(object):
> for d, subm in self._dirs.iteritems():
> subp1 = m1._dirs.get(d, emptytree)._node
> subp2 = m2._dirs.get(d, emptytree)._node
> if subp1 == revlog.nullid:
> subp1, subp2 = subp2, subp1
> writesubtree(subm, subp1, subp2)
>
> class manifest(revlog.revlog):
> + DECOMPRESSEDCACHESIZE = 20000
> +
> def __init__(self, opener, dir='', dirlogcache=None):
> '''The 'dir' and 'dirlogcache' arguments are for internal use by
> manifest.manifest only. External users should create a root
> manifest
> log with manifest.manifest(opener) and call dirlog() on it.
> '''
> # During normal operations, we expect to deal with not more than
> four
> # revs at a time (such as during commit --amend). When rebasing
> large
> # stacks of commits, the number can go up, hence the config knob
> below.
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -190,16 +190,18 @@ class revlog(object):
> version data. This makes retrieval of a version proportional to
> its size, or O(1) relative to the number of revisions.
>
> Both pieces of the revlog are written to in an append-only
> fashion, which means we never need to rewrite a file to insert or
> remove data, and can use some simple techniques to avoid the need
> for locking while reading.
> """
> + DECOMPRESSEDCACHESIZE = 10
> +
> def __init__(self, opener, indexfile):
> """
> create a revlog object
>
> opener is a function that abstracts the file opening operation
> and can be used to implement COW semantics or the like.
> """
> self.indexfile = indexfile
> @@ -209,16 +211,17 @@ class revlog(object):
> self._cache = None
> # 2-tuple of (rev, baserev) defining the base revision the delta
> chain
> # begins at for a revision.
> self._basecache = None
> # 2-tuple of (offset, data) of raw data from the revlog at an
> offset.
> self._chunkcache = (0, '')
> # How much data to read and cache into the raw revlog data cache.
> self._chunkcachesize = 65536
> + self._decompressedcache =
> util.lrucachedict(self.DECOMPRESSEDCACHESIZE)
> self._maxchainlen = None
> self._aggressivemergedeltas = False
> self.index = []
> # Mapping of partial identifiers to full nodes.
> self._pcache = {}
> # Mapping of revision integer to full node.
> self._nodecache = {nullid: nullrev}
> self._nodepos = None
> @@ -1055,38 +1058,47 @@ class revlog(object):
> """
> if not revs:
> return []
> start = self.start
> length = self.length
> inline = self._inline
> iosize = self._io.size
> buffer = util.buffer
> + decompressedcache = self._decompressedcache
>
> l = []
> ladd = l.append
>
> # Historically, the list of revs for a delta chain has been
> contiguous,
> # equivalent to ``range(revs[0], revs[-1])``. With generaldelta,
> the
> # chain can have holes. If we performed a single request for all
> revs
> # between base and tip, we could pull in data for several
> unrelated
> # revisions.
> #
> # Our strategy is to walk the revisions and obtain only the data
> for
> # revisions we care about.
> def grabchunks(startrev, endrev):
> try:
> offset, data = self._chunkraw(startrev, endrev, df=df)
> for rev in range(startrev, endrev + 1):
> + try:
> + ladd(decompressedcache[rev])
> + continue
> + except KeyError:
> + pass
> +
> chunkstart = start(rev)
> if inline:
> chunkstart += (rev + 1) * iosize
> chunklength = length(rev)
> revchunk = buffer(data, chunkstart - offset,
> chunklength)
> - ladd(decompress(revchunk))
> + decompressed = decompress(revchunk)
> + decompressedcache[rev] = decompressed
> + ladd(decompressed)
> except OverflowError:
> # issue4215 - we can't cache a run of chunks greater than
> # 2G on Windows.
> for rev in range(startrev, endrev + 1):
> ladd(self._chunk(rev, df=df))
>
> maxindex = len(revs) - 1
> lastrev = revs[0] - 1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151122/11b7a21c/attachment.html>
More information about the Mercurial-devel
mailing list