[PATCH] transaction: move changelog finalizer to be before bookmark finalizer

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Sat Mar 19 17:43:24 EDT 2016


At Sat, 19 Mar 2016 10:04:23 -0700,
Durham Goode wrote:
> 
> On 3/18/16 6:07 PM, Pierre-Yves David wrote:
> >
> > On 03/16/2016 11:31 PM, Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham at fb.com>
> >> # Date 1458196059 25200
> >> #      Wed Mar 16 23:27:39 2016 -0700
> >> # Node ID cdf4a94fdc46d27196f5411fc7a4008690834fba
> >> # Parent  ed75909c4c670a7d9db4a2bef9817a0d5f0b4d9c
> >> transaction: move changelog finalizer to be before bookmark finalizer
> >>
> >> Previously, transaction close would run the file generators before 
> >> running the
> >> finalizers (see the list below for what is in each). Since file 
> >> generators
> >> contain the bookmarks and the dirstate, this meant we made the 
> >> dirstate and
> >> bookmarks visible to external readers before we actually wrote the 
> >> commits into
> >> the changelog, which could result in missing bookmarks and missing 
> >> working copy
> >> parents (especially on servers with high commit throughput, since 
> >> pulls might
> >> fail to see certain bookmarks in this situation).
> >>
> >> By moving the changelog writing to be before the bookmark/dirstate 
> >> writing, we
> >> ensure the commits are present before they are referenced.
> >>
> >> For reference, file generators currently consist of: bookmarks, 
> >> dirstate, and
> >> phases. Finalizers currently consist of: changelog, revbranchcache, 
> >> and fncache.
> >> All of the former reference the latter, so therefore the latter 
> >> should be
> >> written first.
> >>
> >> Technically there's still plenty of race conditions (can the order of 
> >> finalizers
> >> affect how external readers see the repo?), but this is a step 
> >> forward at least.
> >
> > Can we get a more complete analyse on the race condition we have here 
> > and which ones we are trading for which others one?
> >
> > For example, this changes could result in phase root being visible 
> > after the changesets, making public being seen public while they are not.
> >
> > I would like to have a clear idea of where we are walking there and 
> > what tradeoff we are performing
> K, given that file generators are "bookmarks, dirstate, and phases" and 
> finalizers are "revlogs, revbranchcache, fncache", the old race 
> conditions are:
> 
> - bookmarks written before revlogs - causes bookmarks to appear missing 
> during read
> - dirstate written before revlogs - causes missing working copy parent
> - phases written before revlogs - should be benign, since any phase on a 
> commit that is missing only affects commits that are also missing
> - bookmarks/dirstate/phases written before revbranch cache - no issues
> - bookmarks/dirstate/phases written before fncache - no issues
> 
> and the new race conditions are:
> 
> - revlogs written before bookmarks - may see commits where the bookmark 
> hasn't moved on top of them yet; odd but not an invalid state
> - revlogs written before dirstate - no issues
> - revlogs written before phases - some commits may show up with the 
> wrong phase (potentially public)
> - revbranch cache written before bookmarks/dirstate/phases - no issues
> - fncache written before bookmarks/dirstate/phases - no issues
> 
> Race conditions that existed before (within the two categories):
> - fncache before revlog - fncache could point to file that doesn't exist
> - revlog before fncache - fncache could be missing a file
> 
> 
> So in summary, my change removes the invalid bookmarks and invalid 
> dirstate case, and replaces it with an out-of-date phase case.  The out 
> of date phase case could potentially be bad, since if we send public 
> commits to the client, it may be hard to make them draft later. On the 
> flip side, most central-server setups are publishing servers, so this 
> wouldn't actually be an issue for them. This could be fixed by moving 
> the phase writes to be before the revlogs (or by redoing the entire 
> storage system to be more widely transactional).

Is there any reason why all file generators should be executed at once ?

If not, how about categorizing file generators into two groups
("prior" to finalizing and other) by changes like below ?

====================
diff --git a/mercurial/phases.py b/mercurial/phases.py
--- a/mercurial/phases.py
+++ b/mercurial/phases.py
@@ -268,7 +268,7 @@ class phasecache(object):
         self.invalidate()
         self.dirty = True

-        tr.addfilegenerator('phase', ('phaseroots',), self._write)
+        tr.addfilegenerator('phase', ('phaseroots',), self._write, prior=True)
         tr.hookargs['phases_moved'] = '1'

     def advanceboundary(self, repo, tr, targetphase, nodes):
diff --git a/mercurial/transaction.py b/mercurial/transaction.py
--- a/mercurial/transaction.py
+++ b/mercurial/transaction.py
@@ -249,7 +249,7 @@ class transaction(object):

     @active
     def addfilegenerator(self, genid, filenames, genfunc, order=0,
-                         location=''):
+                         location='', prior=False):
         """add a function to generates some files at transaction commit

         The `genfunc` argument is a function capable of generating proper
@@ -274,14 +274,20 @@ class transaction(object):
         """
         # For now, we are unable to do proper backup and restore of custom vfs
         # but for bookmarks that are handled outside this mechanism.
-        self._filegenerators[genid] = (order, filenames, genfunc, location)
+        self._filegenerators[genid] = (order, filenames, genfunc, location,
+                                       prior)

-    def _generatefiles(self, suffix=''):
+    def _generatefiles(self, generateprior, suffix=''):
         # write files registered for generation
         any = False
         for entry in sorted(self._filegenerators.values()):
             any = True
-            order, filenames, genfunc, location = entry
+            order, filenames, genfunc, location, prior = entry
+
+            # for generation at closing, check prior of generator
+            if not suffix and generateprior != prior:
+                continue
+
             vfs = self._vfsmap[location]
             files = []
             try:
@@ -407,10 +413,11 @@ class transaction(object):
         '''commit the transaction'''
         if self.count == 1:
             self.validator(self)  # will raise exception if needed
-            self._generatefiles()
+            self._generatefiles(True)
             categories = sorted(self._finalizecallback)
             for cat in categories:
                 self._finalizecallback[cat](self)
+            self._generatefiles(False)

         self.count -= 1
         if self.count != 0:
====================

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list