[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Fri May 22 11:34:56 CDT 2015
At Thu, 21 May 2015 16:59:47 -0500,
Pierre-Yves David wrote:
>
> On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1432051569 -32400
> > # Wed May 20 01:06:09 2015 +0900
> > # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> > # Parent 2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> > dirstate: read from `.pending` file under HG_PENDING mode
> >
> > True/False value of `_pendingmode` means whether `dirstate.pending` is
> > used to initialize own `_map` and so on. When it is None, neither
> > `dirstate` nor `dirstate.pending` is read in yet.
> >
> > This is used to keep consistent view between `_pl()` and `_read()`.
> >
> > Once `_pendingmode` is determined by reading one of `dirstate` or
> > `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> > is invoked. This should be reasonable, because:
> >
> > - effective `invalidate()` invocation should occur only in wlock scope, and
> > - wlock can't be gotten under HG_PENDING mode
> >
> > For review-ability, this patch focuses only on reading `.pending` file
> > in. Then, `dirstate.write()` can write changes out even under
> > HG_PENDING mode. But it is still safe enough, because there is no code
> > path writing `.pending` file out yet (= `_pendingmode` never be True).
> >
> > `_trypending()` is defined as a normal function to factor similar code
> > path (in bookmarks and phases) out in the future easily.
>
> This series is poking in the right direction, but it is failing to
> acknowledge the following list of principle and will not result in a
> working behavior.
>
> 1) '.pending' files are made necessary by the existence of a transaction,
> 2) HG_PENDING (and .pending files usage) must -only- occured in the
> context of a transaction.
> 3) The transaction object is the one handle the '.pending' files and
> associated non-pending file. This includes:
>
> 3.1) The transaction must purge any existing '.pending' when it open/close
> 3.2) generation of .pending files is done through the transaction
> 3.3) writing of the final files is also done through the transaction
>
> *) We now have a developer warning feature, we should use it more.
>
> Let me details this more.
Thank you for detailed explanation !
And, perhaps, should I post the patch below not as the first one of
next series but as the last one of this series ?
I put this into next series, because this seemed to increase
review-ability of client code paths (= making in-memory changes
visible to external process) of these functions in next series.
====================
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -632,6 +632,35 @@ class dirstate(object):
return
self._writedirstate(self._filename)
+ def delayupdate(self, tr):
+ '''Delay visibility of dirstate to other readers
+ '''
+ category = 'dirstate-%i' % id(self)
+
+ tr.addpending(category, self._writepending)
+
+ # pending file should be fixed up regardless of outcome of transaction
+ tr.addfinalize(category, self._fixuppending)
+ tr.addabort(category, self._fixuppending)
+
+ def makependingvisible(self, repo):
+ '''Make pending disrtate changes visible to external process
+
+ This is an utility to treat pending changes regardless of
+ transaction activity.
+
+ This returns the value to be set to 'HG_PENDING' environment
+ variable for external process.
+ '''
+ tr = repo.currenttransaction()
+ if tr:
+ self.delayupdate(tr)
+ if tr.writepending():
+ return repo.root # set 'HG_PENDING' only in this case
+ else:
+ self.write()
+ return ""
+
def _removepending(self):
try:
self._opener.unlink(self._pendingfilename)
====================
This assumes that:
- `dirstate.delayupdate()` will be used in certain transaction scope
(e.g. for `pretxncommit` hook), like `changelog.delayupdate()`
- `dirstate.makependingvisible()` will be used in other code paths
(e.g. for commit editor, `precommit` hook and so on), where
transaction activity is determined only at runtime.
I think that using these functions can satisfy rules you described
above. How about this ?
I chose `addpending()` instead of `addfilegenerator()` to write
dirstate pending file out, because assumption below of the latter
seems not suitable for dirstate characteristic:
- changes will be written out at successful `transaction.close()`
- pending file will be discarded at transaction failure
As described in patch #4 of this series, dirstate changes should be
written out (at releasing wlock) regardless of outcome of transaction
in almost all cases.
Then, I chose registering `dirstate._fixuppending` via `addfinalize()`
and `addabort()` in `dirstate.delayupdate()` to certainly cleanup
(remove or rename) pending file at the end of transaction.
This can also delay writing dirstate changes out until releasing
wlock, even though it may be a little violation of (3.3).
For example, rebasing revisions causes one transaction for each
revisions to be rebased in a wlock scope. In this case, it is enough
to write dirstate changes out only at releasing wlock, if no external
(hook) process is invoked while rebasing.
`dirstate.write()` and `dirstate.invalidate()` may be invoked even in
transaction scope (e.g. via `dirstateguard` in import, amend and so
on).
This is reason why `_removepending()` and `_fixuppending()` are
invoked in them.
BTW:
> Developer Warning:
> -----------------
>
> To work your code (will) make multiple assumption:
>
[snip]
> - write are always done in a dirstate guard context,
Even though `dirstateguard` is typical `dirstate.write()` caller, I
don't assume above. But are there something to indicate or suggest so ?
I may overllook them :-<
> 1) Transaction make '.pending' necessary.
> -----------------------------------------
>
> The goal of transaction is to prevent any external viewer to see its
> content before it is committed. So anything happening during the
> transaction is actually written to disk only at `tr.close()` time.
>
> To lets hooks and tools see the content of the transaction, the
> 'pending' mechanism is used. An environment variable let sub-invocation
> of hg know that it needs to looks for extra data in different file.
>
>
>
> 2) pending happen during transaction only
> -----------------------------------------
>
> In the general case, we do not use the same 'pending' mechanism for
> dirstate related changes for the following reason:
>
> - dirstate is related to the working copy. Most external viewer like
> clone/pull/push do not care about the working copy so we are not at risk
> of uncommited data escaping the repository.
>
> - dirstate is related to the content of working copy (parents, but also
> file content). We cannot have an atomic update of all the file in the
> working copy so we'll have transient invalid state during the update
> anyway, therefor we are not even trying to ensure we have an atomic
> transaction for external viewer.
>
> However, there is a case where we need dirstate to use a '.pending'
> mechanism: When the dirstate change makes it refer to content contained
> in a not-yet committed transaction.
>
> - we do not want to make such dirstate visible until the transaction
> content is visible.
>
> - sub-invocation of mercurial in hook/tool need to be able to see this
> content anyway.
>
> To simplify things. I'm assuming:
>
> "dirstate refer to content in the transaction."
>
> is the same as:
>
> "dirstate change happen during a transaction."
>
>
> So, if your dirstate change happens inside a transaction, it should be
> collaborating with it. Your series have to make use of
> the`repo.currenttransaction()` method at least once somewhere.
>
> 3) The transaction object is the one handling file
> --------------------------------------------------
>
> To simplify entry points and ensure consistency, the transaction is
> handling all files involved in the transaction from end to end. So when
> dirstate collaborate with the transaction, it has to let the transaction
> handle the dirstate file
>
>
> related to point 3.2: We are generating `pending` file on demand,
> because if no subprocess is to be call, we do not need to generated
> them. Every subcall that can need pending file will use call
> `tr.pending()` to trigger this generation. This is a single entry point
> that the code know and use. The function will also handle the case where
> there is no pending changes and not special logic is needed.
>
> To get the transaction able to include your pending file, you have to
> register a "file generator" with `tr.addfilegenerator` the documentation
> is fairly extensive so I won't repeat it here. (if document is unclear
> we have to patch it).
>
> I expect a call like:
>
> tr.addfilegenerator('dirstate', ('dirstate', self._write,
> location='plain')
>
> Where self._write is a method accepting an (already open) file object
> and writing dirstate content in it.
>
> addfilegenerator can be called multiple time, each call replacing the other.
>
> related to point 3.3: You have to let the transaction handle the final
> write too. The reason you have dirstate being handled by the transaction
> logic (if present) is that you never want to write the dirstate to disk
> (except pending) until the transaction data are visible too. And the
> dirstate/guard object have no way to know when this will happen. The
> transaction object is the one who know about that.
>
> Lets looks at the following case:
>
> with tr:
> commit = newcommit()
> with ds:
> ds.setparent()
> ds.close() # confirm the dirstate change
> othercrap()
> tr.close() # commit the transaction
>
> In this case the "ds.close()" is not the right time to write the file
> down, it is too early. It has to be written down at 'tr.close()' time.
> By chance, the transaction will automatically take care of that since
> you used `addfilegenerator`.
>
> summary: your .close() must not touch disk in the in-transaction case.
> Will handle it.
>
> The final writing of the dirstate file will not be a simple move of the
> '.pending' file. This is necessary because the pending are lazily
> generated. Lets looks at the following scenario:
>
> with tr:
> commit = newcommit()
> with ds:
> ds.setparent()
> ds.close() # confirm the move
> tr.pending()
> with ds:
> ds.setparent()
> ds.close() # confirm the move
>
> tr.close() # commit the transaction
>
> In this case, the .pending file is outdated compared to the in-memory
> state. So we need to regenerate the file anyway.
>
> summary: transaction won't use simple rename.
>
> related to point 3.1: You also have to be very careful of not leaving
> any '.pending' file around. Lets look at the following scenario.
>
> $ hg commit # write dirstate.pending, leave it behind
> $ hg up 42 # does not need to write a pending file
> $ hg pull # transaction call hook on pending change, dirstate is read
> # from .pending becaues it exists
>
> So we have to make sure we do not leave the file behind. By chance, the
> transaction will take care of that for you. It keep track of all
> temporary file it generated and make sure they are cleaned up when the
> transaction is closed or rollbacked.
>
> (We should probably aggressively nuke all known '.pending; type file at
> transaction open time but I'm not sure it is done yet.)
>
> other nice thing: the transaction logic will also backup your file (and
> restore in rollback case) your dirstate file as soon as your register a
> generator for it.
>
> Developer Warning:
> -----------------
>
> To work your code (will) make multiple assumption:
>
> - wlock is alway taken when we touch the dirstate,
> - write are always done in a dirstate guard context,
> - we do not do any direct writing if a transaction exists,
>
> It is easy to check this is true, but we crashing in these case is
> suboptimal since it will affect human user of unsafe, but already
> existing code. By chance we now have a "developer warning concept" (that
> you get from `devel.all=true` config) and the default behavior for tests
> is now to run them. So adding code to detect and issue warning in this
> case will let the code base be cleaned up over time.
>
> --
> Pierre-Yves David
>
----------------------------------------------------------------------
[FUJIWARA Katsunori] foozy at lares.dti.ne.jp
More information about the Mercurial-devel
mailing list