[PATCH 3 of 3] revlog: use an LRU cache for delta chain bases

Tue Aug 23 02:03:17 EDT 2016

Oh, another follow-up that I almost forgot: I've pushed this to committed.

On Mon, Aug 22, 2016, 23:00 Gregory Szorc <gregory.szorc at gmail.com> wrote:

> On Mon, Aug 22, 2016 at 9:50 PM, Gregory Szorc <gregory.szorc at gmail.com>
> wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc at gmail.com>
>> # Date 1471927730 25200
>> #      Mon Aug 22 21:48:50 2016 -0700
>> # Node ID bd07831e4af707bb43bbe48b5a1d5d08803c26d8
>> # Parent  c49d98dbf6319b14162c3a4126a2a1757927e982
>> revlog: use an LRU cache for delta chain bases
>>
>> Profiling using statprof revealed a hotspot during changegroup
>> application calculating delta chain bases on generaldelta repos.
>> Essentially, revlog._addrevision() was performing a lot of redundant
>> work tracing the delta chain as part of determining when the chain
>> distance was acceptable. This was most pronounced when adding
>> revisions to manifests, which can have delta chains thousands of
>> revisions long.
>>
>> There was a delta chain base cache on revlogs before, but it only
>> captured a single revision. This was acceptable before generaldelta,
>> when _addrevision would build deltas from the previous revision and
>> thus we'd pretty much guarantee a cache hit when resolving the delta
>> chain base on a subsequent _addrevision call. However, it isn't
>> suitable for generaldelta because parent revisions aren't necessarily
>> the last processed revision.
>>
>> This patch converts the delta chain base cache to an LRU dict cache.
>> The cache can hold multiple entries, so generaldelta repos have a
>> higher chance of getting a cache hit.
>>
>> The impact of this change when processing changegroup additions is
>> significant. On a generaldelta conversion of the "mozilla-unified"
>> repo (which contains heads of the main Firefox repositories in
>> chronological order - this means there are lots of transitions between
>> heads in revlog order), this change has the following impact when
>> performing an `hg unbundle` of an uncompressed bundle of the repo:
>>
>> before: 5:42 CPU time
>> after:  4:34 CPU time
>>
>> Most of this time is saved when applying the changelog and manifest
>> revlogs:
>>
>> before: 2:30 CPU time
>> after:  1:17 CPU time
>>
>> That nearly a 50% reduction in CPU time applying changesets and
>> manifests!
>>
>> Applying a gzipped bundle of the same repo (effectively simulating a
>> `hg clone` over HTTP) showed a similar speedup:
>>
>> before: 5:53 CPU time
>> after:  4:46 CPU time
>>
>> Wall time improvements were basically the same as CPU time.
>>
>> I didn't measure explicitly, but it feels like most of the time
>> is saved when processing manifests. This makes sense, as large
>> manifests tend to have very long delta chains and thus benefit the
>> most from this cache.
>>
>> So, this change effectively makes changegroup application (which is
>> used by `hg unbundle`, `hg clone`, `hg pull`, `hg unshelve`, and
>> various other commands) significantly faster when delta chains are
>> long (which can happen on repos with large numbers of files and thus
>> large manifests).
>>
>> In theory, this change can result in more memory utilization. However,
>> we're caching a dict of ints. At most we have 200 ints + Python object
>> overhead per revlog. And, the cache is really only populated when
>> performing read-heavy operations, such as adding changegroups or
>> scanning an individual revlog. For memory bloat to be an issue, we'd
>> need to scan/read several revisions from several revlogs all while
>> having active references to several revlogs. I don't think there are
>> many operations that do this, so I don't think memory bloat from the
>> cache will be an issue.
>>
>>
> To follow up on this, this change shows little to no benefit on repos
> without interleaved heads or with shorter delta chains. (And this makes
> sense.)
>
> Both the mozilla-central repo (which has some merges but only a single
> head) and the pypy repo (which has 161 heads but only 5500 files) showed no
> statistical change with this patch.
>
> The performance optimization appears to only manifest when delta chain
> base calculation can take a lot of time and that only happens if delta
> chains are really long - say >10,000. This is common on the mozilla-unified
> repo, which on the server is using aggressivemergedeltas and produces
> extremely long delta chains (35k+) unless we limit their length with
> format.maxchainlen (which we do - at 10k). However, local clients don't
> apply this delta chain limit during `hg clone` and other unbundling
> operations since Mercurial enforces no delta chain length limit by default
> - only delta size and distance.
>
> I'm curious if anyone else is running a repo with say 50,000+ files and
> interleaved heads that could corroborate this improvement.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160823/ae715edc/attachment.html>