[PATCH 07 of 14] sparse-revlog: rework the way we enforce chunk size limit

Mon Nov 12 04:55:42 EST 2018

# HG changeset patch
# User Boris Feld <boris.feld at octobus.net>
# Date 1541782717 -3600
#      Fri Nov 09 17:58:37 2018 +0100
# Node ID b77a6b74ef31e1b3706c1c6127a15eede0334f71
# Parent  ddafb271512fc26de60da5dceffc1509bb023d66
# EXP-Topic sparse-perf
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r b77a6b74ef31
sparse-revlog: rework the way we enforce chunk size limit

We move from a O(N) algorithm to a O(log(N)) algorithm.

The previous algorithm was traversing the whole delta chain, looking for the
exact point where it became too big. This would result in most of the delta
chain to be traversed.

Instead, we now use a "binary" approach, slicing the chain in two until we
have a chunk of the appropriate size.

We still keep the previous algorithm for the snapshots part. There are few of
them and they are large bits of data distant from each other. So the previous
algorithm should work well in that case.

To take a practical example of restoring manifest revision '59547c40bc4c' for
a reference NetBeans repository (using sparse-revlog). The media time of the
step `slice-sparse-chain` of `perfrevlogrevision` improve from 1.109 ms to
0.660 ms.

diff --git a/mercurial/revlogutils/deltas.py b/mercurial/revlogutils/deltas.py
--- a/mercurial/revlogutils/deltas.py
+++ b/mercurial/revlogutils/deltas.py
@@ -176,18 +176,22 @@ def _slicechunktosize(revlog, revs, targ
     [[3], [5]]
     """
     assert targetsize is None or 0 <= targetsize
-    if targetsize is None or segmentspan(revlog, revs) <= targetsize:
+    startdata = revlog.start(revs[0])
+    enddata = revlog.end(revs[-1])
+    fullspan = enddata - startdata
+    if targetsize is None or fullspan <= targetsize:
         yield revs
         return
 
     startrevidx = 0
-    startdata = revlog.start(revs[0])
     endrevidx = 0
     iterrevs = enumerate(revs)
     next(iterrevs) # skip first rev.
+    # first step: get snapshots out of the way
     for idx, r in iterrevs:
         span = revlog.end(r) - startdata
-        if span <= targetsize:
+        snapshot = revlog.issnapshot(r)
+        if span <= targetsize and snapshot:
             endrevidx = idx
         else:
             chunk = _trimchunk(revlog, revs, startrevidx, endrevidx + 1)
@@ -196,6 +200,29 @@ def _slicechunktosize(revlog, revs, targ
             startrevidx = idx
             startdata = revlog.start(r)
             endrevidx = idx
+        if not snapshot:
+            break
+
+    # for the others, we use binary slicing to quickly converge toward valid
+    # chunks (otherwise, we might end up looking for start/end of many
+    # revisions)
+    nbitem = len(revs)
+    while (enddata - startdata) > targetsize:
+        endrevidx = nbitem
+        if nbitem - startrevidx <= 1:
+            break # protect against individual chunk larger than limit
+        localenddata = revlog.end(revs[endrevidx - 1])
+        span = localenddata - startdata
+        while (localenddata - startdata) > targetsize:
+            if endrevidx - startrevidx <= 1:
+                break # protect against individual chunk larger than limit
+            endrevidx -= (endrevidx - startrevidx) // 2
+            localenddata = revlog.end(revs[endrevidx -1])
+            span = localenddata - startdata
+        yield _trimchunk(revlog, revs, startrevidx, endrevidx)
+        startrevidx = endrevidx
+        startdata = revlog.start(revs[startrevidx])
+
     yield _trimchunk(revlog, revs, startrevidx)
 
 def _slicechunktodensity(revlog, revs, targetdensity=0.5,