[PATCH 3 of 4] changegroup: reuse revlog file handle when generating group
Gregory Szorc
gregory.szorc at gmail.com
Tue Nov 1 21:16:38 EDT 2016
# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1476648175 25200
# Sun Oct 16 13:02:55 2016 -0700
# Node ID 45759b1b6883c2f4b1fc0227710150ef94380927
# Parent d631065a702fa7eb956258e2289679d5902ccff6
changegroup: reuse revlog file handle when generating group
Previously, every time we needed new data from a revlog during
changegroup generation, a new file handle would be opened,
seeked, read, and closed.
After this patch, we use the just-added context manager on the
revlog class to cache and reuse a file handle for the duration
of the changegroup operation.
When generating a v2 bundle on the mozilla-unified repo, this change
causes the following changes to system call counts:
Function Before After Delta
read 576,062 576,085 +23
open 274,939 269,167 -5,772
fstat 808,762 797,216 -11,546
write 952,449 952,473 +24
lseek 536,450 536,452 +2
close 272,314 266,542 -5,772
lstat 20,185 20,185 0
It's worth noting that this repo has 265,773 revlog files (.i + .d)
but only 941 of them are non-inline. That means of the non-inline
revlogs, we're preventing multiple redundant opens per revlog on
average. (Most of the prevented opens are on changelog and manifest.)
On my Linux machine, this change appears to show no improvement in
wall time. However, fewer system calls is fewer system calls. And
when I/O is involved, I think aiming for 0 system calls is worthwhile.
diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -567,13 +567,15 @@ class cg1packer(object):
# build deltas
total = len(revs) - 1
msgbundling = _('bundling')
- for r in xrange(len(revs) - 1):
- if units is not None:
- self._progress(msgbundling, r + 1, unit=units, total=total)
- prev, curr = revs[r], revs[r + 1]
- linknode = lookup(revlog.node(curr))
- for c in self.revchunk(revlog, curr, prev, linknode):
- yield c
+
+ with revlog.cachefilehandle():
+ for r in xrange(len(revs) - 1):
+ if units is not None:
+ self._progress(msgbundling, r + 1, unit=units, total=total)
+ prev, curr = revs[r], revs[r + 1]
+ linknode = lookup(revlog.node(curr))
+ for c in self.revchunk(revlog, curr, prev, linknode):
+ yield c
if units is not None:
self._progress(msgbundling, None)
More information about the Mercurial-devel
mailing list