[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode

Pierre-Yves David pierre-yves.david at ens-lyon.org
Thu May 21 16:59:47 CDT 2015



On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1432051569 -32400
> #      Wed May 20 01:06:09 2015 +0900
> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> # Parent  2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> dirstate: read from `.pending` file under HG_PENDING mode
>
> True/False value of `_pendingmode` means whether `dirstate.pending` is
> used to initialize own `_map` and so on. When it is None, neither
> `dirstate` nor `dirstate.pending` is read in yet.
>
> This is used to keep consistent view between `_pl()` and `_read()`.
>
> Once `_pendingmode` is determined by reading one of `dirstate` or
> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> is invoked. This should be reasonable, because:
>
> - effective `invalidate()` invocation should occur only in wlock scope, and
> - wlock can't be gotten under HG_PENDING mode
>
> For review-ability, this patch focuses only on reading `.pending` file
> in. Then, `dirstate.write()` can write changes out even under
> HG_PENDING mode. But it is still safe enough, because there is no code
> path writing `.pending` file out yet (= `_pendingmode` never be True).
>
> `_trypending()` is defined as a normal function to factor similar code
> path (in bookmarks and phases) out in the future easily.

This series is poking in the right direction, but it is failing to 
acknowledge the following list of principle and will not result in a 
working behavior.

1) '.pending' files are made necessary by the existence of a transaction,
2) HG_PENDING (and .pending files usage) must -only- occured in the 
context of a transaction.
3) The transaction object is the one handle the '.pending' files and 
associated non-pending file. This includes:

3.1) The transaction must purge any existing '.pending' when it open/close
3.2) generation of .pending files is done through the transaction
3.3) writing of the final files is also done through the transaction

*) We now have a developer warning feature, we should use it more.

Let me details this more.


1) Transaction make '.pending' necessary.
-----------------------------------------

The goal of transaction is to prevent any external viewer to see its 
content before it is committed. So anything happening during the 
transaction is actually written to disk only at `tr.close()` time.

To lets hooks and tools see the content of the transaction, the 
'pending' mechanism is used. An environment variable let sub-invocation 
of hg know that it needs to looks for extra data in different file.



2) pending happen during transaction only
-----------------------------------------

In the general case, we do not use the same 'pending' mechanism for 
dirstate related changes for the following reason:

- dirstate is related to the working copy. Most external viewer like 
clone/pull/push do not care about the working copy so we are not at risk 
of uncommited data escaping the repository.

- dirstate is related to the content of working copy (parents, but also 
file content). We cannot have an atomic update of all the file in the 
working copy so we'll have transient invalid state during the update 
anyway, therefor we are not even trying to ensure we have an atomic 
transaction for external viewer.

However, there is a case where we need dirstate to use a '.pending' 
mechanism: When the dirstate change makes it refer to content contained 
in a not-yet committed transaction.

- we do not want to make such dirstate visible until the transaction 
content is visible.

- sub-invocation of mercurial in hook/tool need to be able to see this 
content anyway.

To simplify things. I'm assuming:

   "dirstate refer to content in the transaction."

is the same as:

   "dirstate change happen during a transaction."


So, if your dirstate change happens inside a transaction, it should be 
collaborating with it. Your series have to make use of 
the`repo.currenttransaction()` method at least once somewhere.

3) The transaction object is the one handling file
--------------------------------------------------

To simplify entry points and ensure consistency, the transaction is 
handling all files involved in the transaction from end to end. So when 
dirstate collaborate with the transaction, it has to let the transaction 
handle the dirstate file


related to point 3.2: We are generating `pending` file on demand, 
because if no subprocess is to be call, we do not need to generated 
them. Every subcall that can need pending file will use call 
`tr.pending()` to trigger this generation. This is a single entry point 
that the code know and use. The function will also handle the case where 
there is no pending changes and not special logic is needed.

To get the transaction able to include your pending file, you have to 
register a "file generator" with `tr.addfilegenerator` the documentation 
is fairly extensive so I won't repeat it here. (if document is unclear 
we have to patch it).

I expect a call like:

   tr.addfilegenerator('dirstate', ('dirstate', self._write,
                       location='plain')

Where self._write is a method accepting an (already open) file object 
and writing dirstate content in it.

addfilegenerator can be called multiple time, each call replacing the other.

related to point 3.3: You have to let the transaction handle the final 
write too. The reason you have dirstate being handled by the transaction 
logic (if present) is that you never want to write the dirstate to disk 
(except pending) until the transaction data are visible too. And the 
dirstate/guard object have no way to know when this will happen. The 
transaction object is the one who know about that.

Lets looks at the following case:

   with tr:
       commit = newcommit()
       with ds:
            ds.setparent()
            ds.close() # confirm the dirstate change
       othercrap()
       tr.close() # commit the transaction

In this case the "ds.close()" is not the right time to write the file 
down, it is too early. It has to be written down at 'tr.close()' time.
By chance, the transaction will automatically take care of that since 
you used `addfilegenerator`.

summary: your .close() must not touch disk in the in-transaction case. 
Will handle it.

The final writing of the dirstate file will not be a simple move of the 
'.pending' file. This is necessary because the pending are lazily 
generated. Lets looks at the following scenario:

   with tr:
       commit = newcommit()
       with ds:
            ds.setparent()
            ds.close() # confirm the move
       tr.pending()
       with ds:
            ds.setparent()
            ds.close() # confirm the move

       tr.close() # commit the transaction

In this case, the .pending file is outdated compared to the in-memory 
state. So we need to regenerate the file anyway.

summary: transaction won't use simple rename.

related to point 3.1: You also have to be very careful of not leaving 
any '.pending' file around. Lets look at the following scenario.

  $ hg commit # write dirstate.pending, leave it behind
  $ hg up 42 # does not need to write a pending file
  $ hg pull # transaction call hook on pending change, dirstate is read
            # from .pending becaues it exists

So we have to make sure we do not leave the file behind. By chance, the 
transaction will take care of that for you. It keep track of all 
temporary file it generated and make sure they are cleaned up when the 
transaction is closed or rollbacked.

(We should probably aggressively nuke all known '.pending; type file at 
transaction open time but I'm not sure it is done yet.)

other nice thing: the transaction logic will also backup your file (and 
restore in rollback case) your dirstate file as soon as your register a 
generator for it.

Developer Warning:
-----------------

To work your code (will) make multiple assumption:

- wlock is alway taken when we touch the dirstate,
- write are always done in a dirstate guard context,
- we do not do any direct writing if a transaction exists,

It is easy to check this is true, but we crashing in these case is 
suboptimal since it will affect human user of unsafe, but already 
existing code. By chance we now have a "developer warning concept" (that 
you get from `devel.all=true` config) and the default behavior for tests 
is now to run them. So adding code to detect and issue warning in this 
case will let the code base be cleaned up over time.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list