[PATCH 09 of 14] obscache: add a cache for 1/2 of the "obsolete" property

Sun Jul 9 13:55:21 EDT 2017

# HG changeset patch
# User Boris Feld <boris.feld at octobus.net>
# Date 1495197986 -7200
#      Fri May 19 14:46:26 2017 +0200
# Node ID 63214f4d9a766761259b650539eede424413e6a2
# Parent  774ff18cc36b72822f598b4fa5a51628513926e3
# EXP-Topic obs-cache
obscache: add a cache for 1/2 of the "obsolete" property

Knowing if a changeset is obsolete requires two data:

1) is the change non-public,
2) is the changeset affected by any obsolescence markers.

The phase related data has some fast implementation already. However, the
obsmarkers based property currently requires to parse the whole obsstore, a
slow operation.

That information is monotonic (new changeset are not affected, and once they
are affected, they will be for ever), making it is easy to cache. We introduce
a new class dedicated to caching of this information. That first
implementation still needs to parse the full obsstore when updating for the
sake of simplicity. It will be improved later to allow lighter upgrade.

The next changesets will put this new cache to use.

That code is coming from the evolve extension, were it matured. To keep this
changeset simple, there are a couple of improvement in the extension that will
be ported later.

diff -r 774ff18cc36b -r 63214f4d9a76 mercurial/obsolete.py

--- a/mercurial/obsolete.py	Sat Jul 08 16:26:16 2017 +0200
+++ b/mercurial/obsolete.py	Fri May 19 14:46:26 2017 +0200
@@ -75,6 +75,7 @@
 
 from .i18n import _
 from . import (
+    cache,
     error,
     node,
     obsutil,
@@ -907,6 +908,145 @@
     repo.ui.deprecwarn(movemsg, '4.3')
     return obsutil.successorssets(repo, initialnode, cache=cache)
 
+class obscache(cache.dualsourcecache):
+    """cache "does a rev is the precursors of some obsmarkers" property
+
+    This is not directly holding the "is this revision obsolete" information,
+    because phases data gets into play here. However, it allow to compute the
+    "obsolescence" set without reading the obsstore content.
+
+    The cache use a bytearray to store that data and simply write it on disk
+    for storage.
+
+    Implementation note #1:
+
+      The obsstore is implementing only half of the transaction logic it
+      should. It properly record the starting point of the obsstore to allow
+      clean rollback. However it still write to the obsstore file directly
+      during the transaction. Instead it should be keeping data in memory and
+      write to a '.pending' file to make the data vailable for hooks.
+
+      This cache is not going further than what the obsstore is doing, so it
+      does not has any '.pending' logic. When the obsstore gains proper
+      '.pending' support, adding it to this cache should not be too hard. As
+      the flag always move from 0 to 1, we could have a second '.pending' cache
+      file to be read. If flag is set in any of them, the value is 1. For the
+      same reason, updating the file in place should be possible.
+
+    Implementation note #2:
+
+        Storage-wise, we could have a "start rev" to avoid storing useless
+        zero. That would be especially useful for the '.pending' overlay.
+    """
+
+    _filepath = 'cache/obscache-v01'
+    _cachename = 'obscache' # used for error message
+
+    def __init__(self, repo):
+        super(obscache, self).__init__()
+        self._ondiskkey = None
+        self._vfs = repo.vfs
+        self._data = bytearray()
+
+    def get(self, rev):
+        """return True if "rev" is used as "precursors for any obsmarkers
+
+        IMPORTANT: make sure the cache has been updated to match the repository
+        content before using it"""
+        return self._data[rev]
+
+    def clear(self, reset=False):
+        """invalidate the in memory cache content"""
+        super(obscache, self).clear(reset=reset)
+        self._data = bytearray()
+
+    def _updatefrom(self, repo, data):
+        if data[0]:
+            self._updaterevs(repo,  data[0])
+        if data[1]:
+            self._updatemarkers(repo,  data[1])
+
+    def _updaterevs(self, repo, revs):
+        """update the cache with new revisions
+
+        Newly added changesets might be affected by obsolescence markers we
+        already have locally. So we needs to have some global knowledge about
+        the markers to handle that question.
+
+        XXX performance note:
+
+        Right now this requires parsing all markers in the obsstore. We could
+        imagine using various optimisation (eg: another cache, network
+        exchange, etc).
+
+        A possible approach to this is to build a set of all nodes used as
+        precursors in `obsstore._obscandidate`. If markers are not loaded yet,
+        we could initialize it by doing a quick scan through the obsstore data
+        and filling a (pre-sized) set. Doing so would be much faster than
+        parsing all the obsmarkers since we would access less data, not create
+        any object beside the nodes and not have to decode any complex data.
+
+        For now we stick to the simpler approach of paying the
+        performance cost on new changesets.
+        """
+        obsstore = repo.obsstore
+
+        if not self._data:
+            # new cache
+            self._data = bytearray(len(revs))
+        else:
+            # incremental update
+            self._data.extend(bytearray(len(revs)))
+        assert len(self._data) == len(repo.changelog)
+
+        if obsstore:
+            # no obstore mean we can skip the obsmarkers search
+            node = repo.changelog.node
+            succs = repo.obsstore.successors
+            for r in revs:
+                if node(r) in succs:
+                    self._data[r] = 1
+
+    def _updatemarkers(self, repo, obsmarkers):
+        """update the cache with new markers"""
+        rev = repo.changelog.nodemap.get
+        for m in obsmarkers:
+            r = rev(m[0])
+            if r is not None:
+                self._data[r] = 1
+
+    def save(self, repo):
+        """save the data to disk
+
+        Format is pretty simple, we serialise the cache key and then drop the
+        bytearray.
+        """
+
+        # XXX we'll need some pending related logic when the obsstore get it
+
+        if self._cachekey is None or self._cachekey == self._ondiskkey:
+            return
+
+        cachefile = repo.vfs(self._filepath, 'w', atomictemp=True)
+        headerdata = self._serializecachekey()
+        cachefile.write(headerdata)
+        cachefile.write(self._data)
+        cachefile.close()
+
+    def load(self, repo):
+        """load data from disk"""
+        assert repo.filtername is None
+
+        data = repo.vfs.tryread(self._filepath)
+        if not data:
+            self._cachekey = self.emptykey
+            self._data = bytearray()
+        else:
+            headerdata = data[:self._cachekeysize]
+            self._cachekey = self._deserializecachekey(headerdata)
+            self._data = bytearray(data[self._cachekeysize:])
+        self._ondiskkey = self._cachekey
+
 # mapping of 'set-name' -> <function to compute this set>
 cachefuncs = {}
 def cachefor(name):