[PATCH 3 of 6 V2] obscache: add a cache for 1/2 of the "obsolete" property

Pierre-Yves David pierre-yves.david at ens-lyon.org
Sat May 20 11:30:17 EDT 2017


# HG changeset patch
# User Pierre-Yves David <pierre-yves.david at octobus.net>
# Date 1495197986 -7200
#      Fri May 19 14:46:26 2017 +0200
# Node ID e3900752e4e16857c65466ceda175ce6781d519d
# Parent  41f64bdc68e3361782f19172c95db4775c44fa6c
# EXP-Topic obscache
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r e3900752e4e1
obscache: add a cache for 1/2 of the "obsolete" property

Knowing if a changeset is obsolete requires two data:

1) is the change non-public,
2) is the changeset affected by any obsolescence markers.

The phase related data has some fast implementation already. However, the
obsmarkers based property currently requires to parse the whole obsstore, a slow
operation.

That information is monotonic (new changeset are not affected, and once they are
affected, they will be for ever), making it is easy to cache. We introduce a new
class dedicated to this information. That first implementation still needs to
parse the full obsstore when updating for the sake of simplicity. It will be
improved later to allow lighter upgrade.

The next changesets will put this new cache to use.

That code is coming from the evolve extension, were it matured. To keep this
changeset simple, there are a couple of improvement in the extension that will
be ported later.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -1370,6 +1370,138 @@ class dualsourcecache(object):
 
         return reset, revs, markers, (obssize, obskey)
 
+class obscache(dualsourcecache):
+    """cache "does a rev is the precursors of some obsmarkers" property
+
+    This is not directly holding the "is this revision obsolete" information,
+    because phases data gets into play here. However, it allow to compute the
+    "obsolescence" set without reading the obsstore content.
+
+    The cache use a bytearray to store that data and simply write it on disk
+    for storage.
+
+    Implementation note #1:
+
+      The obsstore is implementing only half of the transaction logic it
+      should. It properly record the starting point of the obsstore to allow
+      clean rollback. However it still write to the obsstore file directly
+      during the transaction. Instead it should be keeping data in memory and
+      write to a '.pending' file to make the data vailable for hooks.
+
+      This cache is not going further than what the obsstore is doing, so it
+      does not has any '.pending' logic. When the obsstore gains proper
+      '.pending' support, adding it to this cache should not be too hard. As
+      the flag always move from 0 to 1, we could have a second '.pending' cache
+      file to be read. If flag is set in any of them, the value is 1. For the
+      same reason, updating the file in place should be possible.
+
+    Implementation note #2:
+
+        Storage-wise, we could have a "start rev" to avoid storing useless
+        zero. That would be especially useful for the '.pending' overlay.
+    """
+
+    _filepath = 'cache/obscache-v01'
+    _headerformat = '>q20sQQ20s'
+
+    _cachename = 'obscache' # used for error message
+
+    def __init__(self, repo):
+        super(obscache, self).__init__()
+        self._ondiskkey = None
+        self._vfs = repo.vfs
+        self._data = bytearray()
+
+    def get(self, rev):
+        """return True if "rev" is used as "precursors for any obsmarkers
+
+        IMPORTANT: make sure the cache has been updated to match the repository
+        content before using it"""
+        return self._data[rev]
+
+    def clear(self, reset=False):
+        """invalidate the in memory cache content"""
+        super(obscache, self).clear(reset=reset)
+        self._data = bytearray()
+
+    def _updatefrom(self, repo, revs, obsmarkers):
+        if revs:
+            self._updaterevs(repo, revs)
+        if obsmarkers:
+            self._updatemarkers(repo, obsmarkers)
+
+    def _updaterevs(self, repo, revs):
+        """update the cache with new revisions
+
+        Newly added changesets might be affected by obsolescence markers we
+        already have locally. So we needs to have some global knowledge about
+        the markers to handle that question.
+
+        XXX performance note:
+
+        Right now this requires parsing all markers in the obsstore. We could
+        imagine using various optimisation (eg: another cache, network
+        exchange, etc).
+
+        A possible approach to this is to build a set of all nodes used as
+        precursors in `obsstore._obscandidate`. If markers are not loaded yet,
+        we could initialize it by doing a quick scan through the obsstore data
+        and filling a (pre-sized) set. Doing so would be much faster than
+        parsing all the obsmarkers since we would access less data, not create
+        any object beside the nodes and not have to decode any complex data.
+
+        For now we stick to the simpler approach of paying the
+        performance cost on new changesets.
+        """
+        node = repo.changelog.node
+        succs = repo.obsstore.successors
+        for r in revs:
+            val = int(node(r) in succs)
+            self._data.append(val)
+        cl = repo.changelog
+        assert len(self._data) == len(cl), (len(self._data), len(cl))
+
+    def _updatemarkers(self, repo, obsmarkers):
+        """update the cache with new markers"""
+        rev = repo.changelog.nodemap.get
+        for m in obsmarkers:
+            r = rev(m[0])
+            if r is not None:
+                self._data[r] = 1
+
+    def save(self, repo):
+        """save the data to disk
+
+        Format is pretty simple, we serialise the cache key and then drop the
+        bytearray.
+        """
+
+        # XXX we'll need some pending related logic when the obsstore get it
+
+        if self._cachekey is None or self._cachekey == self._ondiskkey:
+            return
+
+        cachefile = repo.vfs(self._filepath, 'w', atomictemp=True)
+        headerdata = struct.pack(self._headerformat, *self._cachekey)
+        cachefile.write(headerdata)
+        cachefile.write(self._data)
+        cachefile.close()
+
+    def load(self, repo):
+        """load data from disk"""
+        assert repo.filtername is None
+
+        data = repo.vfs.tryread(self._filepath)
+        if not data:
+            self._cachekey = self.emptykey
+            self._data = bytearray()
+        else:
+            headersize = struct.calcsize(self._headerformat)
+            self._cachekey = struct.unpack(self._headerformat,
+                                           data[:headersize])
+            self._data = bytearray(data[headersize:])
+        self._ondiskkey = self._cachekey
+
 # mapping of 'set-name' -> <function to compute this set>
 cachefuncs = {}
 def cachefor(name):


More information about the Mercurial-devel mailing list