Testing very long delta chains

Tue Dec 22 23:41:47 CST 2015

On Tue, 2015-12-22 at 17:27 -0800, Gregory Szorc wrote:
> https://www.mercurial-scm.org/wiki/BigRepositories has been updated with a
> link to
> https://hg.mozilla.org/users/gszorc_mozilla.com/mozilla-central-aggressivemerg
> edeltas,
> which is a generaldelta clone of mozilla-central with
> format.aggressivemergedeltas enabled.
> 
> The last manifest delta chain in this repo is over 45,000 entries deep and
> it makes for a good benchmark for testing revlog reading performance.
> 
> Remember: `hg clone --uncompressed` to preserve the delta chains from the
> server or your client will recompute them as part of applying the
> changegroup.

Without my threaded zlib hack:

$ hg perfmanifest 277045
! wall 0.749929 comb 0.740000 user 0.730000 sys 0.010000 (best of 13)

(25% CPU usage on a CPU with 4 threads)

With my threaded zlib hack (threads = 4):

$ hg perfmanifest 277045
! wall 0.480251 comb 1.090000 user 0.990000 sys 0.100000
(best of 20)

(50% CPU usage on a CPU with 4 threads)

Things we can do better:

- add a C decompress helper
- that works on lists of buffers
- that calls zlib directly
- that uses threads
- that uses larger buffers
- that uses a faster zlib

(For this last, the cloudflare fork of zlib has a faster CRC function that seems to be worth about 20%)


# HG changeset patch
# User Matt Mackall <mpm at selenic.com>
# Date 1450727921 21600
#      Mon Dec 21 13:58:41 2015 -0600
# Node ID b56bc1676b5d4a14167be2498921b57f06ddcd69
# Parent  3dea4eae4eebac11741f0c1dc5dcd9c88d8f4554
revlog: thread decompress

diff -r 3dea4eae4eeb -r b56bc1676b5d mercurial/revlog.py

--- a/mercurial/revlog.py	Mon Dec 21 14:52:18 2015 -0600
+++ b/mercurial/revlog.py	Mon Dec 21 13:58:41 2015 -0600
@@ -17,6 +17,8 @@
 import errno
 import os
 import struct
+import threading
+import Queue
 import zlib
 
 # import stuff from node for others to import from revlog
@@ -1132,14 +1134,38 @@
             # 2G on Windows
             return [self._chunk(rev, df=df) for rev in revs]
 
-        for rev in revs:
+        slots = [None] * len(revs)
+
+        work = []
+        done = Queue.Queue()
+
+        for slot, rev in enumerate(revs):
             chunkstart = start(rev)
             if inline:
                 chunkstart += (rev + 1) * iosize
             chunklength = length(rev)
-            ladd(decompress(buffer(data, chunkstart - offset, chunklength)))
+            buf = buffer(data, chunkstart - offset, chunklength)
+            if buf and buf[0] == 'x':
+                work.append((slot, buf))
+            else:
+                slots[slot] = decompress(buf)
 
-        return l
+        def worker():
+            try:
+                while True:
+                    slot, buf = work.pop()
+                    slots[slot] = _decompress(buf)
+            except:
+                done.put(1)
+
+        tcount = 4
+        for w in xrange(tcount - 1):
+            threading.Thread(target=worker).start()
+        worker()
+        for w in xrange(tcount):
+            done.get()
+
+        return slots
 
     def _chunkclear(self):
         """Clear the raw chunk cache."""

-- 
Mathematics is the supreme nostalgia of our time.