[PATCH 3 of 3 RFC] localrepository: add caches to transaction (issue4255)

Durham Goode durham at fb.com
Wed May 28 12:40:07 CDT 2014


On 5/27/14, 11:00 PM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:

># HG changeset patch
># User Gregory Szorc <gregory.szorc at gmail.com>
># Date 1401253147 25200
>#      Tue May 27 21:59:07 2014 -0700
># Node ID 12c2a7ae1d443e6ea89ea6f8f1032bf2178af15d
># Parent  02939296b8052833d4c8bfa6e2e2f1371fe5c517
>localrepository: add caches to transaction (issue4255)
>
>localrepository.transaction() now creates a chained transaction bound
>to the .hg opener in addition to the .hg/store opener. This new
>transaction creates backups of most files under .hg/cache.
>
>With this change, transaction rollbacks will restore caches, thus
>preventing cache invalidation and recomputation. This will result
>in considerable performance wins on repositories where cache
>generation is slow.
>
>On a repository with 12,421 heads and manifests with near 100,000
>entries, branch caches can take ~17 minutes to generate
>(259s for base, 68s for immutable, 723s for served). On a server
>that had hooks causing branch cache updates, transaction rollbacks
>resulted in branch cache invalidation and an expensive computation.
>This patch makes the problem go away.
>
>THIS PATCH CURRENTLY CAUSES A REGRESSION IN test-fncache.t.
>THIS PATCH SHOULD ALSO HAVE EXPLICIT TEST COVERAGE THAT CACHES ARE
>RESTORED.
>
>diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>--- a/mercurial/localrepo.py
>+++ b/mercurial/localrepo.py
>@@ -848,8 +848,16 @@ class localrepository(object):
> 
>     def wwritedata(self, filename, data):
>         return self._filter(self._decodefilterpats, filename, data)
> 
>+    def _cachetransactionblacklist(self):
>+        """Set of cache files that are not safe to include in
>transactions.
>+
>+        Files in .hg/cache/ are included in transactions by default. To
>+        prevent that from happening, add an entry to this set.
>+        """
>+        return set(['storehash'])
>+
>     def transaction(self, desc, report=None):
>         tr = self._transref and self._transref() or None
>         if tr and tr.running():
>             return tr.nest()
>@@ -870,8 +878,27 @@ class localrepository(object):
>                                      "journal",
>                                      aftertrans(renames),
>                                      self.store.createmode,
>                                      onclose)
>+
>+        # Add caches to a chained transaction.
>+        if self.ui.configbool("format", "usestore", True):
>+            try:
>+                def noreport(msg):
>+                    pass
>+                tr2 = transaction.transaction(noreport, self.opener,
>+                                              "journal",
>+                 
>createmode=self.store.createmode)
>+                blacklist = self._cachetransactionblacklist()
>+                for filename, mode in self.vfs.readdir("cache"):
>+                    if filename not in blacklist:
>+                        tr2.addbackup("cache/%s" % filename,
>hardlink=False)
>+                tr.chaintransaction(tr2)
>+
>+            except OSError, e:
>+                if e.errno != errno.ENOENT:
>+                    raise
>+
>         self._transref = weakref.ref(tr)
>         return tr
> 
>     def _journalfiles(self):


I’m not sure I like the chained transaction concept.  How is the chain
represented on disk (i.e. if the user ctrl+c’s, how does rollback handle
it)? What happens if the _abort for one transaction fails or the process
is killed before all the chained transactions are aborted?  What if the
same file is added to two transactions (perhaps by a poorly written
extension)? These caches can be written outside the lock right (like
during read only operations)? So what happens if someone modifies a cache
outside the lock while another process is in the middle of transacting it?

I like how elegantly this addresses your problem, but I think the
transaction model is already fragile and difficult enough to reason about,
and I think adding multiple simultaneous transactions will make it even
harder to reason about transaction behavior in the future.  If we want the
caches to be transacted, I think we need to refactor the existing
transaction to support them, make sure they are never written outside of a
transaction, and make sure they are never written during read only
operations.

I have a local commit where I’ve started doing this for the phase cache,
but that’s one of the easier ones since it is mostly modified inside locks
already.



More information about the Mercurial-devel mailing list