[Bug 5006] New: Quadratic expense in no-op pull
mercurial-bugs at selenic.com
mercurial-bugs at selenic.com
Tue Dec 15 19:29:56 UTC 2015
https://bz.mercurial-scm.org/show_bug.cgi?id=5006
Bug ID: 5006
Summary: Quadratic expense in no-op pull
Product: Mercurial
Version: default branch
Hardware: PC
OS: Linux
Status: UNCONFIRMED
Severity: feature
Priority: wish
Component: evolution
Assignee: bugzilla at selenic.com
Reporter: mpm at selenic.com
CC: mercurial-devel at selenic.com,
pierre-yves.david at ens-lyon.org
When doing a no-op http pull between two copies of the main hg repo, I get the
following behavior:
searching for changes
no changes found
(70 seconds of silence)
Of that 70 seconds, 16 seconds is network activity (transferring all the
markers again) followed by 54 seconds of CPU activity that confuses my
profiler:
http://whatever.computer/pull2.txt
That turns out to be caused by ~2000 calls to mergemarkers with a batch of ~45
markers. That in turn calls self.add() which builds a set from all known
markers. Thus, the time is quadratic in the number of markers.
Here is my set of hacks to show the problem and cache the marker set. I have no
idea if this is correct, but it makes the 54 seconds disappear:
diff -r f403693d5f7c mercurial/obsolete.py
--- a/mercurial/obsolete.py Sat Dec 12 21:36:21 2015 -0600
+++ b/mercurial/obsolete.py Tue Dec 15 13:29:16 2015 -0600
@@ -148,6 +148,7 @@
def _fm0readmarkers(data, off):
# Loop on markers
l = len(data)
+ res = []
while off + _fm0fsize <= l:
# read fixed part
cur = data[off:off + _fm0fsize]
@@ -194,8 +195,10 @@
parents = None
metadata = tuple(sorted(metadata.iteritems()))
+ res.append((pre, sucs, flags, metadata, date, parents))
- yield (pre, sucs, flags, metadata, date, parents)
+ print "count", len(res)
+ return res
def _fm0encodeonemarker(marker):
pre, sucs, flags, metadata, date, parents = marker
@@ -523,6 +526,8 @@
self.svfs = svfs
self._version = defaultformat
self._readonly = readonly
+ self._mergecount = 0
+ self._mergeknown = None
def __iter__(self):
return iter(self._all)
@@ -592,7 +597,11 @@
if self._readonly:
raise error.Abort('creating obsolete markers is not enabled on '
'this repo')
- known = set(self._all)
+
+ if not self._mergeknown:
+ self._mergeknown = set(self._all)
+ known = self._mergeknown
+
new = []
for m in markers:
if m not in known:
@@ -623,7 +632,11 @@
Returns the number of new markers added."""
version, markers = _readmarkers(data)
- return self.add(transaction, markers)
+ print "start merge", self._mergecount
+ self._mergecount += 1
+ a = self.add(transaction, markers)
+ print "end", a
+ return a
@propertycache
def _all(self):
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the Mercurial-devel
mailing list