[PATCH 19 of 22] obsstore: use radixlink to back markerindex

Jun Wu quark at fb.com
Sun Jun 4 19:59:31 EDT 2017


# HG changeset patch
# User Jun Wu <quark at fb.com>
# Date 1496614546 25200
#      Sun Jun 04 15:15:46 2017 -0700
# Node ID 160567b62f74e8e02fa2e9a7b2cb4076b105e528
# Parent  2c2cbefde423ea2537270cf5fecdf5f5d14b19e5
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 160567b62f74
obsstore: use radixlink to back markerindex

This replaces Python dict where values are set of markers to radixlink where
values are linked lists of offsets.

hg perfvolatilesets on hg-committed shows (best wall time in seconds):

             | --clear-obsstore    | no --clear-obsstore
             | before   | after    | before   | after
  --------------------------------------------------------
   bumped    | 0.791746 | 0.464715 | 0.013843 | 0.019339
   divergent | 0.823026 | 0.524332 | 0.032086 | 0.048416
   extinct   | 0.597182 | 0.409694 | 0.035425 | 0.035519
   obsolete  | 0.583754 | 0.378805 | 0.006312 | 0.006591
   suspended | 0.613364 | 0.434053 | 0.034716 | 0.035313
   unstable  | 0.579994 | 0.394650 | 0.016263 | 0.016496
   base      | 0.001387 | 0.001205 | 0.001048 | 0.001038
   immutable | 0.006587 | 0.006494 | 0.005581 | 0.005561
   served    | 0.469272 | 0.369578 | 0.012776 | 0.012895
   visible   | 0.573799 | 0.392960 | 0.009748 | 0.010171

Notice that --clear-obsolete reflects real-world usage - you have to load
the obsstore before calculating the sets, and C radixlink has lower overhead
than Python dict + set. Without --clear-obsstore, things are slower because
parsing from offsets is on-demand. But that does not reflect real-world
usage.

Real-world commands on hg-committed become faster as expected:

               | before | after
  -------------------------------
   hg id       | 0.73   | 0.55
   hg log -r . | 1.05   | 0.74

Note: pure will be slower. But I'd prefer code simplify to optimizing pure
with complex code. Most of our users should use cext version any way.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -85,4 +85,5 @@ from . import (
 
 parsers = policy.importmod(r'parsers')
+radixlink = policy.importmod(r'radixlink')
 
 _pack = struct.pack
@@ -526,13 +527,15 @@ class marker(object):
         return self._data[2]
 
-class markerindex(dict):
+class markerindex(radixlink.radixlink):
     """key is usually a node, value is a set of markers"""
 
     def __init__(self, obsstore, name, keyfunc):
         """keyfunc: rawmarker -> [key]"""
+        super(markerindex, self).__init__()
         self._keyfunc = keyfunc
         self._obsstore = weakref.proxy(obsstore)
         self.name = name
         self.sourceoftruthsize = 0
+        self._readmarker = formats[obsstore._version][0]
         self.update()
 
@@ -544,11 +547,24 @@ class markerindex(dict):
             return
         keyfunc = self._keyfunc
-        setdefault = self.setdefault
         markers, offsets = self._obsstore._readmarkers(self.sourceoftruthsize)
-        for marker in markers:
+        for i, marker in enumerate(markers):
+            offset = offsets[i]
             for k in keyfunc(marker):
-                setdefault(k, set()).add(marker)
+                self.insert(k, offset)
         self.sourceoftruthsize = len(data)
 
+    def __getitem__(self, key):
+        offsets = super(markerindex, self).__getitem__(key)
+        # convert offsets to markers
+        data = self._obsstore._data
+        read = self._readmarker
+        return set(read(data, off, off + 1)[0] for off in offsets)
+
+    def get(self, key, default=None):
+        try:
+            return self[key]
+        except KeyError:
+            return default
+
 class markerreader(object):
     """read a range of markers with proper caching"""


More information about the Mercurial-devel mailing list