[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Fri May 22 11:34:56 CDT 2015


At Thu, 21 May 2015 16:59:47 -0500,
Pierre-Yves David wrote:
> 
> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1432051569 -32400
> > #      Wed May 20 01:06:09 2015 +0900
> > # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> > # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> > dirstate: read from `.pending` file under HG_PENDING mode
> >
> > True/False value of `_pendingmode` means whether `dirstate.pending` is
> > used to initialize own `_map` and so on. When it is None, neither
> > `dirstate` nor `dirstate.pending` is read in yet.
> >
> > This is used to keep consistent view between `_pl()` and `_read()`.
> >
> > Once `_pendingmode` is determined by reading one of `dirstate` or
> > `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> > is invoked. This should be reasonable, because:
> >
> > - effective `invalidate()` invocation should occur only in wlock scope, and
> > - wlock can't be gotten under HG_PENDING mode
> >
> > For review-ability, this patch focuses only on reading `.pending` file
> > in. Then, `dirstate.write()` can write changes out even under
> > HG_PENDING mode. But it is still safe enough, because there is no code
> > path writing `.pending` file out yet (= `_pendingmode` never be True).
> >
> > `_trypending()` is defined as a normal function to factor similar code
> > path (in bookmarks and phases) out in the future easily.
> 
> This series is poking in the right direction, but it is failing to 
> acknowledge the following list of principle and will not result in a 
> working behavior.
> 
> 1) '.pending' files are made necessary by the existence of a transaction,
> 2) HG_PENDING (and .pending files usage) must -only- occured in the 
> context of a transaction.
> 3) The transaction object is the one handle the '.pending' files and 
> associated non-pending file. This includes:
> 
> 3.1) The transaction must purge any existing '.pending' when it open/close
> 3.2) generation of .pending files is done through the transaction
> 3.3) writing of the final files is also done through the transaction
> 
> *) We now have a developer warning feature, we should use it more.
> 
> Let me details this more.

Thank you for detailed explanation !

And, perhaps, should I post the patch below not as the first one of
next series but as the last one of this series ?

I put this into next series, because this seemed to increase
review-ability of client code paths (= making in-memory changes
visible to external process) of these functions in next series.

    ====================
    diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
    --- a/mercurial/dirstate.py
    +++ b/mercurial/dirstate.py
    @@ -632,6 +632,35 @@ class dirstate(object):
                 return
             self._writedirstate(self._filename)
    
    +    def delayupdate(self, tr):
    +        '''Delay visibility of dirstate to other readers
    +        '''
    +        category = 'dirstate-%i' % id(self)
    +
    +        tr.addpending(category, self._writepending)
    +
    +        # pending file should be fixed up regardless of outcome of transaction
    +        tr.addfinalize(category, self._fixuppending)
    +        tr.addabort(category, self._fixuppending)
    +
    +    def makependingvisible(self, repo):
    +        '''Make pending disrtate changes visible to external process
    +
    +        This is an utility to treat pending changes regardless of
    +        transaction activity.
    +
    +        This returns the value to be set to 'HG_PENDING' environment
    +        variable for external process.
    +        '''
    +        tr = repo.currenttransaction()
    +        if tr:
    +            self.delayupdate(tr)
    +            if tr.writepending():
    +                return repo.root # set 'HG_PENDING' only in this case
    +        else:
    +            self.write()
    +        return ""
    +
         def _removepending(self):
             try:
                 self._opener.unlink(self._pendingfilename)
    ====================

This assumes that:

  - `dirstate.delayupdate()` will be used in certain transaction scope
    (e.g. for `pretxncommit` hook), like `changelog.delayupdate()`

  - `dirstate.makependingvisible()` will be used in other code paths
    (e.g. for commit editor, `precommit` hook and so on), where
    transaction activity is determined only at runtime.

I think that using these functions can satisfy rules you described
above. How about this ?


I chose `addpending()` instead of `addfilegenerator()` to write
dirstate pending file out, because assumption below of the latter
seems not suitable for dirstate characteristic:

  - changes will be written out at successful `transaction.close()`
  - pending file will be discarded at transaction failure

As described in patch #4 of this series, dirstate changes should be
written out (at releasing wlock) regardless of outcome of transaction
in almost all cases.

Then, I chose registering `dirstate._fixuppending` via `addfinalize()`
and `addabort()` in `dirstate.delayupdate()` to certainly cleanup
(remove or rename) pending file at the end of transaction.

This can also delay writing dirstate changes out until releasing
wlock, even though it may be a little violation of (3.3).

For example, rebasing revisions causes one transaction for each
revisions to be rebased in a wlock scope. In this case, it is enough
to write dirstate changes out only at releasing wlock, if no external
(hook) process is invoked while rebasing.


`dirstate.write()` and `dirstate.invalidate()` may be invoked even in
transaction scope (e.g. via `dirstateguard` in import, amend and so
on).

This is reason why `_removepending()` and `_fixuppending()` are
invoked in them.


BTW:

> Developer Warning:
> -----------------
> 
> To work your code (will) make multiple assumption:
> 
[snip]
> - write are always done in a dirstate guard context,

Even though `dirstateguard` is typical `dirstate.write()` caller, I
don't assume above. But are there something to indicate or suggest so ?
I may overllook them :-<



> 1) Transaction make '.pending' necessary.
> -----------------------------------------
> 
> The goal of transaction is to prevent any external viewer to see its 
> content before it is committed. So anything happening during the 
> transaction is actually written to disk only at `tr.close()` time.
> 
> To lets hooks and tools see the content of the transaction, the 
> 'pending' mechanism is used. An environment variable let sub-invocation 
> of hg know that it needs to looks for extra data in different file.
> 
> 
> 
> 2) pending happen during transaction only
> -----------------------------------------
> 
> In the general case, we do not use the same 'pending' mechanism for 
> dirstate related changes for the following reason:
> 
> - dirstate is related to the working copy. Most external viewer like 
> clone/pull/push do not care about the working copy so we are not at risk 
> of uncommited data escaping the repository.
> 
> - dirstate is related to the content of working copy (parents, but also 
> file content). We cannot have an atomic update of all the file in the 
> working copy so we'll have transient invalid state during the update 
> anyway, therefor we are not even trying to ensure we have an atomic 
> transaction for external viewer.
> 
> However, there is a case where we need dirstate to use a '.pending' 
> mechanism: When the dirstate change makes it refer to content contained 
> in a not-yet committed transaction.
> 
> - we do not want to make such dirstate visible until the transaction 
> content is visible.
> 
> - sub-invocation of mercurial in hook/tool need to be able to see this 
> content anyway.
> 
> To simplify things. I'm assuming:
> 
>    "dirstate refer to content in the transaction."
> 
> is the same as:
> 
>    "dirstate change happen during a transaction."
> 
> 
> So, if your dirstate change happens inside a transaction, it should be 
> collaborating with it. Your series have to make use of 
> the`repo.currenttransaction()` method at least once somewhere.
> 
> 3) The transaction object is the one handling file
> --------------------------------------------------
> 
> To simplify entry points and ensure consistency, the transaction is 
> handling all files involved in the transaction from end to end. So when 
> dirstate collaborate with the transaction, it has to let the transaction 
> handle the dirstate file
> 
> 
> related to point 3.2: We are generating `pending` file on demand, 
> because if no subprocess is to be call, we do not need to generated 
> them. Every subcall that can need pending file will use call 
> `tr.pending()` to trigger this generation. This is a single entry point 
> that the code know and use. The function will also handle the case where 
> there is no pending changes and not special logic is needed.
> 
> To get the transaction able to include your pending file, you have to 
> register a "file generator" with `tr.addfilegenerator` the documentation 
> is fairly extensive so I won't repeat it here. (if document is unclear 
> we have to patch it).
> 
> I expect a call like:
> 
>    tr.addfilegenerator('dirstate', ('dirstate', self._write,
>                        location='plain')
> 
> Where self._write is a method accepting an (already open) file object 
> and writing dirstate content in it.
> 
> addfilegenerator can be called multiple time, each call replacing the other.
> 
> related to point 3.3: You have to let the transaction handle the final 
> write too. The reason you have dirstate being handled by the transaction 
> logic (if present) is that you never want to write the dirstate to disk 
> (except pending) until the transaction data are visible too. And the 
> dirstate/guard object have no way to know when this will happen. The 
> transaction object is the one who know about that.
> 
> Lets looks at the following case:
> 
>    with tr:
>        commit = newcommit()
>        with ds:
>             ds.setparent()
>             ds.close() # confirm the dirstate change
>        othercrap()
>        tr.close() # commit the transaction
> 
> In this case the "ds.close()" is not the right time to write the file 
> down, it is too early. It has to be written down at 'tr.close()' time.
> By chance, the transaction will automatically take care of that since 
> you used `addfilegenerator`.
> 
> summary: your .close() must not touch disk in the in-transaction case. 
> Will handle it.
> 
> The final writing of the dirstate file will not be a simple move of the 
> '.pending' file. This is necessary because the pending are lazily 
> generated. Lets looks at the following scenario:
> 
>    with tr:
>        commit = newcommit()
>        with ds:
>             ds.setparent()
>             ds.close() # confirm the move
>        tr.pending()
>        with ds:
>             ds.setparent()
>             ds.close() # confirm the move
> 
>        tr.close() # commit the transaction
> 
> In this case, the .pending file is outdated compared to the in-memory 
> state. So we need to regenerate the file anyway.
> 
> summary: transaction won't use simple rename.
> 
> related to point 3.1: You also have to be very careful of not leaving 
> any '.pending' file around. Lets look at the following scenario.
> 
>   $ hg commit # write dirstate.pending, leave it behind
>   $ hg up 42 # does not need to write a pending file
>   $ hg pull # transaction call hook on pending change, dirstate is read
>             # from .pending becaues it exists
> 
> So we have to make sure we do not leave the file behind. By chance, the 
> transaction will take care of that for you. It keep track of all 
> temporary file it generated and make sure they are cleaned up when the 
> transaction is closed or rollbacked.
> 
> (We should probably aggressively nuke all known '.pending; type file at 
> transaction open time but I'm not sure it is done yet.)
> 
> other nice thing: the transaction logic will also backup your file (and 
> restore in rollback case) your dirstate file as soon as your register a 
> generator for it.
> 
> Developer Warning:
> -----------------
> 
> To work your code (will) make multiple assumption:
> 
> - wlock is alway taken when we touch the dirstate,
> - write are always done in a dirstate guard context,
> - we do not do any direct writing if a transaction exists,
> 
> It is easy to check this is true, but we crashing in these case is 
> suboptimal since it will affect human user of unsafe, but already 
> existing code. By chance we now have a "developer warning concept" (that 
> you get from `devel.all=true` config) and the default behavior for tests 
> is now to run them. So adding code to detect and issue warning in this 
> case will let the code base be cleaned up over time.
> 
> -- 
> Pierre-Yves David
> 


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list