[PATCH 3 of 4] changegroup: reuse revlog file handle when generating group

Gregory Szorc gregory.szorc at gmail.com
Tue Nov 1 21:16:38 EDT 2016


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1476648175 25200
#      Sun Oct 16 13:02:55 2016 -0700
# Node ID 45759b1b6883c2f4b1fc0227710150ef94380927
# Parent  d631065a702fa7eb956258e2289679d5902ccff6
changegroup: reuse revlog file handle when generating group

Previously, every time we needed new data from a revlog during
changegroup generation, a new file handle would be opened,
seeked, read, and closed.

After this patch, we use the just-added context manager on the
revlog class to cache and reuse a file handle for the duration
of the changegroup operation.

When generating a v2 bundle on the mozilla-unified repo, this change
causes the following changes to system call counts:

Function    Before         After         Delta
read        576,062       576,085           +23
open        274,939       269,167        -5,772
fstat       808,762       797,216       -11,546
write       952,449       952,473           +24
lseek       536,450       536,452            +2
close       272,314       266,542        -5,772
lstat        20,185        20,185             0

It's worth noting that this repo has 265,773 revlog files (.i + .d)
but only 941 of them are non-inline. That means of the non-inline
revlogs, we're preventing multiple redundant opens per revlog on
average. (Most of the prevented opens are on changelog and manifest.)

On my Linux machine, this change appears to show no improvement in
wall time. However, fewer system calls is fewer system calls. And
when I/O is involved, I think aiming for 0 system calls is worthwhile.

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -567,13 +567,15 @@ class cg1packer(object):
         # build deltas
         total = len(revs) - 1
         msgbundling = _('bundling')
-        for r in xrange(len(revs) - 1):
-            if units is not None:
-                self._progress(msgbundling, r + 1, unit=units, total=total)
-            prev, curr = revs[r], revs[r + 1]
-            linknode = lookup(revlog.node(curr))
-            for c in self.revchunk(revlog, curr, prev, linknode):
-                yield c
+
+        with revlog.cachefilehandle():
+            for r in xrange(len(revs) - 1):
+                if units is not None:
+                    self._progress(msgbundling, r + 1, unit=units, total=total)
+                prev, curr = revs[r], revs[r + 1]
+                linknode = lookup(revlog.node(curr))
+                for c in self.revchunk(revlog, curr, prev, linknode):
+                    yield c
 
         if units is not None:
             self._progress(msgbundling, None)


More information about the Mercurial-devel mailing list