[PATCH 4 of 6 frozen-repos] localrepo: ability to obtain a read-only snapshot of a repo

Gregory Szorc gregory.szorc at gmail.com
Sat Nov 21 19:14:01 CST 2015


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1448154001 28800
#      Sat Nov 21 17:00:01 2015 -0800
# Node ID 888c2171adffa8340406b50aae02375f7bef50f4
# Parent  f559e16232491e44fc4dac6c12872cb2e9164d27
localrepo: ability to obtain a read-only snapshot of a repo

As part of performing some performance profiling, I noticed that a lot
of time can be spent resolving repoview.changelog (the changelog must
be resolved when performing repo[x] for example). The reason is that
the changelog lookup function needs to validate that the cached set
of revisions exposed to the view is still accurate. While there are
comments in repoview.py hinting at future optimizations to this
code, I think the ideal solution is to avoid having to validate a
cache altogether.

Many repository operations are short lived and/or assume they are
operating on a consistent snapshot of the repository. However, we
don't really ensure consistency anywhere. It is possible for someone
to e.g. start a revset query (which is lazy), perform a commit, then
continue iterating over the revset. This could produce incorrect
results due to the changelog mutating over the course of the query.

This patch introduces a freezerepo() function which returns a new
repository instance which is read only and is a snapshot of the
state of the repository at the time it was passed in.

Writing to the frozen repo is prevented by overriding methods used to
perform common write operations. Not all writes are likely prevented,
but it's better than nothing. In addition, some other functions related
to mutating repo state will result in aborts when called as well.

The snapshot of the repository is currently implemented by replacing
the changelog lookup function with a resolved changelog instance.
Accessing the changelog on a frozen repo is thus a normal property
lookup and not a function call. The performance impact will be
demonstrated in subsequent patches.

There is an inline TODO regarding making unfiltered repos frozen.
I'm not sure why a few random tests are complaining about unknown
working copy parent when we freeze the returned unfiltered repo.
This is follow up work for another day.

The localrepository class also gained a helper function to produce
a frozen repo. I know we don't like more methods on this class. But
frozen repos are similar to filtered repos. I anticipate we'll be
operating on frozen repos a lot in the near future. And localrepo.py
has import cycle considerations. So I think having the convenience
method is justified.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -413,16 +413,24 @@ class localrepository(object):
     def filtered(self, name):
         """Return a filtered version of a repository"""
         # build a new class with the mixin and the current class
         # (possibly subclass of the repo)
         class proxycls(repoview.repoview, self.unfiltered().__class__):
             pass
         return proxycls(self, name)
 
+    def frozen(self):
+        """Obtain a read-only snapshot of the repository."""
+        # Don't create a new object if we're already frozen.
+        if getattr(self, '_frozencl', None):
+            return self
+
+        return freezerepo(self)
+
     @repofilecache('bookmarks')
     def _bookmarks(self):
         return bookmarks.bmstore(self)
 
     @repofilecache('bookmarks.current')
     def _activebookmark(self):
         return bookmarks.readactive(self)
 
@@ -1931,8 +1939,81 @@ def undoname(fn):
     assert name.startswith('journal')
     return os.path.join(base, name.replace('journal', 'undo', 1))
 
 def instance(ui, path, create):
     return localrepository(ui, util.urllocalpath(path), create)
 
 def islocal(path):
     return True
+
+def freezerepo(repo):
+    """Obtain a read-only snapshot of a repository.
+
+    Returns a new repository instance based on the existing one that
+    is read-only and frozen in time. Lookups that would normally need
+    o be validated against change are not performed. This property makes
+    frozen repository instances suitable for read-only queries where
+    performance is important.
+
+    Any active repo filter, if any, will be preserved in the returned
+    instance.
+    """
+    msg = _('%s should not be called on frozen repos')
+
+    class frozenrepo(repo.__class__):
+        def __init__(self, orig):
+            # We need to call the constructor for repoview if we inherit
+            # from one.
+            if isinstance(orig, repoview.repoview):
+                repoview.repoview.__init__(self, orig, orig.filtername)
+
+            # Stash away a reference to the original repo so we can proxy.
+            object.__setattr__(self, '_unfrozenrepo', orig)
+
+            # changelog object lookup is typically dynamic. This is not
+            # needed on frozen repos. So resolve the changelog once and
+            # cache it.
+            object.__setattr__(self, '_frozencl', orig.changelog)
+
+        @property
+        def changelog(self):
+            return self._frozencl
+
+        # We need to override functions that return repos to returned frozen
+        # repos as well.
+        def unfiltered(self):
+            # TODO this should return a frozen repo. However, a few tests are
+            # failing due to unknown parent of working directory state.
+            return self._unfrozenrepo.unfiltered()
+
+        def filtered(self, name):
+            current = getattr(self, 'filtername', None)
+            if current == name:
+                return self
+
+            return freezerepo(self._unfrozenrepo.unfiltered().filtered(name))
+
+        # Proxy all operations to original repo.
+        def __getattr__(self, attr):
+            return getattr(self._unfrozenrepo, attr)
+
+        def __setattr__(self, attr, value):
+            return setattr(self._unfrozenrepo, attr, value)
+
+        def __delattr__(self, attr):
+            return delattr(self._unfrozenrepo, attr)
+
+        # Catch calls to functions that should never be called on frozen
+        # repo instances.
+        def lock(self, *args, **kwargs):
+            raise error.Abort(msg % 'lock()')
+
+        def transaction(self, *args, **kwargs):
+            raise error.Abort(msg % 'transaction()')
+
+        def commit(self, *args, **kwargs):
+            raise error.Abort(msg % 'commit()')
+
+        def destroying(self):
+            raise error.Abort(msg % 'destroying()')
+
+    return frozenrepo(repo)


More information about the Mercurial-devel mailing list