[PATCH 07 of 14] sparse-revlog: rework the way we enforce chunk size limit
Boris FELD
boris.feld at octobus.net
Thu Nov 15 04:59:32 EST 2018
On 13/11/2018 13:47, Yuya Nishihara wrote:
> On Mon, 12 Nov 2018 10:55:42 +0100, Boris Feld wrote:
>> # HG changeset patch
>> # User Boris Feld <boris.feld at octobus.net>
>> # Date 1541782717 -3600
>> # Fri Nov 09 17:58:37 2018 +0100
>> # Node ID b77a6b74ef31e1b3706c1c6127a15eede0334f71
>> # Parent ddafb271512fc26de60da5dceffc1509bb023d66
>> # EXP-Topic sparse-perf
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> # hg pull https://bitbucket.org/octobus/mercurial-devel/ -r b77a6b74ef31
>> sparse-revlog: rework the way we enforce chunk size limit
>> We move from a O(N) algorithm to a O(log(N)) algorithm.
>>
>> The previous algorithm was traversing the whole delta chain, looking for the
>> exact point where it became too big. This would result in most of the delta
>> chain to be traversed.
>>
>> Instead, we now use a "binary" approach, slicing the chain in two until we
>> have a chunk of the appropriate size.
>>
>> We still keep the previous algorithm for the snapshots part. There are few of
>> them and they are large bits of data distant from each other. So the previous
>> algorithm should work well in that case.
>>
>> To take a practical example of restoring manifest revision '59547c40bc4c' for
>> a reference NetBeans repository (using sparse-revlog). The media time of the
>> step `slice-sparse-chain` of `perfrevlogrevision` improve from 1.109 ms to
>> 0.660 ms.
>>
>> diff --git a/mercurial/revlogutils/deltas.py b/mercurial/revlogutils/deltas.py
>> --- a/mercurial/revlogutils/deltas.py
>> +++ b/mercurial/revlogutils/deltas.py
>> @@ -176,18 +176,22 @@ def _slicechunktosize(revlog, revs, targ
>> [[3], [5]]
>> """
>> assert targetsize is None or 0 <= targetsize
>> - if targetsize is None or segmentspan(revlog, revs) <= targetsize:
>> + startdata = revlog.start(revs[0])
>> + enddata = revlog.end(revs[-1])
>> + fullspan = enddata - startdata
>> + if targetsize is None or fullspan <= targetsize:
>> yield revs
>> return
>>
>> startrevidx = 0
>> - startdata = revlog.start(revs[0])
>> endrevidx = 0
>> iterrevs = enumerate(revs)
>> next(iterrevs) # skip first rev.
>> + # first step: get snapshots out of the way
>> for idx, r in iterrevs:
>> span = revlog.end(r) - startdata
>> - if span <= targetsize:
>> + snapshot = revlog.issnapshot(r)
> This should break the doctest if this module were included in test-doctest.py.
Good catch, these tests stop being covered when they moved to a submodule.
>
>> + if span <= targetsize and snapshot:
>> endrevidx = idx
>> else:
>> chunk = _trimchunk(revlog, revs, startrevidx, endrevidx + 1)
>> @@ -196,6 +200,29 @@ def _slicechunktosize(revlog, revs, targ
>> startrevidx = idx
>> startdata = revlog.start(r)
>> endrevidx = idx
>> + if not snapshot:
>> + break
>> +
>> + # for the others, we use binary slicing to quickly converge toward valid
>> + # chunks (otherwise, we might end up looking for start/end of many
>> + # revisions)
>> + nbitem = len(revs)
>> + while (enddata - startdata) > targetsize:
>> + endrevidx = nbitem
>> + if nbitem - startrevidx <= 1:
>> + break # protect against individual chunk larger than limit
>> + localenddata = revlog.end(revs[endrevidx - 1])
>> + span = localenddata - startdata
>> + while (localenddata - startdata) > targetsize:
>> + if endrevidx - startrevidx <= 1:
>> + break # protect against individual chunk larger than limit
>> + endrevidx -= (endrevidx - startrevidx) // 2
>> + localenddata = revlog.end(revs[endrevidx -1])
>> + span = localenddata - startdata
> The variable 'span' isn't used.
>
>> + yield _trimchunk(revlog, revs, startrevidx, endrevidx)
>> + startrevidx = endrevidx
>> + startdata = revlog.start(revs[startrevidx])
> Do we have enough test coverage for this logic?
>
> FWIW, I'm a little confused that 'endrevidx' only gets smaller and smaller,
> since I thought this would do binary-search the split point. That turned out
> to be wrong. We look for approximate positions to split.
Indeed, we are just looking for quickly converging toward a valid chunk.
Finding a perfect slicing point requires more work and is rarely worth it.
For example if you have 6 consecutive size 2 delta (total size 12) and
the chunk limit is 10 there are many valid slicing. [2, 2, 2] [2, 2, 2]
is as valid as [2, 2, 2, 2, 2] + [2]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list