[Bug 5006] New: Quadratic expense in no-op pull

mercurial-bugs at selenic.com mercurial-bugs at selenic.com
Tue Dec 15 19:29:56 UTC 2015


https://bz.mercurial-scm.org/show_bug.cgi?id=5006

            Bug ID: 5006
           Summary: Quadratic expense in no-op pull
           Product: Mercurial
           Version: default branch
          Hardware: PC
                OS: Linux
            Status: UNCONFIRMED
          Severity: feature
          Priority: wish
         Component: evolution
          Assignee: bugzilla at selenic.com
          Reporter: mpm at selenic.com
                CC: mercurial-devel at selenic.com,
                    pierre-yves.david at ens-lyon.org

When doing a no-op http pull between two copies of the main hg repo, I get the
following behavior:

searching for changes
no changes found
(70 seconds of silence)

Of that 70 seconds, 16 seconds is network activity (transferring all the
markers again) followed by 54 seconds of CPU activity that confuses my
profiler:

http://whatever.computer/pull2.txt

That turns out to be caused by ~2000 calls to mergemarkers with a batch of ~45
markers. That in turn calls self.add() which builds a set from all known
markers. Thus, the time is quadratic in the number of markers.

Here is my set of hacks to show the problem and cache the marker set. I have no
idea if this is correct, but it makes the 54 seconds disappear:

diff -r f403693d5f7c mercurial/obsolete.py
--- a/mercurial/obsolete.py     Sat Dec 12 21:36:21 2015 -0600
+++ b/mercurial/obsolete.py     Tue Dec 15 13:29:16 2015 -0600
@@ -148,6 +148,7 @@
 def _fm0readmarkers(data, off):
     # Loop on markers
     l = len(data)
+    res = []
     while off + _fm0fsize <= l:
         # read fixed part
         cur = data[off:off + _fm0fsize]
@@ -194,8 +195,10 @@
                 parents = None

         metadata = tuple(sorted(metadata.iteritems()))
+        res.append((pre, sucs, flags, metadata, date, parents))

-        yield (pre, sucs, flags, metadata, date, parents)
+    print "count", len(res)
+    return res

 def _fm0encodeonemarker(marker):
     pre, sucs, flags, metadata, date, parents = marker
@@ -523,6 +526,8 @@
         self.svfs = svfs
         self._version = defaultformat
         self._readonly = readonly
+        self._mergecount = 0
+        self._mergeknown = None

     def __iter__(self):
         return iter(self._all)
@@ -592,7 +597,11 @@
         if self._readonly:
             raise error.Abort('creating obsolete markers is not enabled on '
                               'this repo')
-        known = set(self._all)
+
+        if not self._mergeknown:
+            self._mergeknown = set(self._all)
+        known = self._mergeknown
+
         new = []
         for m in markers:
             if m not in known:
@@ -623,7 +632,11 @@

         Returns the number of new markers added."""
         version, markers = _readmarkers(data)
-        return self.add(transaction, markers)
+        print "start merge", self._mergecount
+        self._mergecount += 1
+        a = self.add(transaction, markers)
+        print "end", a
+        return a

     @propertycache
     def _all(self):

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list