[PATCH 3 of 6] dirstate: read from `.pending` file under HG_PENDING mode
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Thu May 21 16:59:47 CDT 2015
On 05/19/2015 11:42 AM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1432051569 -32400
> # Wed May 20 01:06:09 2015 +0900
> # Node ID e830ef506b9aaa01c5406d100a385961c4c0dff0
> # Parent 2d2eff9ee053025e8e0267b8d6c6d07326f607ee
> dirstate: read from `.pending` file under HG_PENDING mode
>
> True/False value of `_pendingmode` means whether `dirstate.pending` is
> used to initialize own `_map` and so on. When it is None, neither
> `dirstate` nor `dirstate.pending` is read in yet.
>
> This is used to keep consistent view between `_pl()` and `_read()`.
>
> Once `_pendingmode` is determined by reading one of `dirstate` or
> `dirstate.pending` in, `_pendingmode` is kept even if `invalidate()`
> is invoked. This should be reasonable, because:
>
> - effective `invalidate()` invocation should occur only in wlock scope, and
> - wlock can't be gotten under HG_PENDING mode
>
> For review-ability, this patch focuses only on reading `.pending` file
> in. Then, `dirstate.write()` can write changes out even under
> HG_PENDING mode. But it is still safe enough, because there is no code
> path writing `.pending` file out yet (= `_pendingmode` never be True).
>
> `_trypending()` is defined as a normal function to factor similar code
> path (in bookmarks and phases) out in the future easily.
This series is poking in the right direction, but it is failing to
acknowledge the following list of principle and will not result in a
working behavior.
1) '.pending' files are made necessary by the existence of a transaction,
2) HG_PENDING (and .pending files usage) must -only- occured in the
context of a transaction.
3) The transaction object is the one handle the '.pending' files and
associated non-pending file. This includes:
3.1) The transaction must purge any existing '.pending' when it open/close
3.2) generation of .pending files is done through the transaction
3.3) writing of the final files is also done through the transaction
*) We now have a developer warning feature, we should use it more.
Let me details this more.
1) Transaction make '.pending' necessary.
-----------------------------------------
The goal of transaction is to prevent any external viewer to see its
content before it is committed. So anything happening during the
transaction is actually written to disk only at `tr.close()` time.
To lets hooks and tools see the content of the transaction, the
'pending' mechanism is used. An environment variable let sub-invocation
of hg know that it needs to looks for extra data in different file.
2) pending happen during transaction only
-----------------------------------------
In the general case, we do not use the same 'pending' mechanism for
dirstate related changes for the following reason:
- dirstate is related to the working copy. Most external viewer like
clone/pull/push do not care about the working copy so we are not at risk
of uncommited data escaping the repository.
- dirstate is related to the content of working copy (parents, but also
file content). We cannot have an atomic update of all the file in the
working copy so we'll have transient invalid state during the update
anyway, therefor we are not even trying to ensure we have an atomic
transaction for external viewer.
However, there is a case where we need dirstate to use a '.pending'
mechanism: When the dirstate change makes it refer to content contained
in a not-yet committed transaction.
- we do not want to make such dirstate visible until the transaction
content is visible.
- sub-invocation of mercurial in hook/tool need to be able to see this
content anyway.
To simplify things. I'm assuming:
"dirstate refer to content in the transaction."
is the same as:
"dirstate change happen during a transaction."
So, if your dirstate change happens inside a transaction, it should be
collaborating with it. Your series have to make use of
the`repo.currenttransaction()` method at least once somewhere.
3) The transaction object is the one handling file
--------------------------------------------------
To simplify entry points and ensure consistency, the transaction is
handling all files involved in the transaction from end to end. So when
dirstate collaborate with the transaction, it has to let the transaction
handle the dirstate file
related to point 3.2: We are generating `pending` file on demand,
because if no subprocess is to be call, we do not need to generated
them. Every subcall that can need pending file will use call
`tr.pending()` to trigger this generation. This is a single entry point
that the code know and use. The function will also handle the case where
there is no pending changes and not special logic is needed.
To get the transaction able to include your pending file, you have to
register a "file generator" with `tr.addfilegenerator` the documentation
is fairly extensive so I won't repeat it here. (if document is unclear
we have to patch it).
I expect a call like:
tr.addfilegenerator('dirstate', ('dirstate', self._write,
location='plain')
Where self._write is a method accepting an (already open) file object
and writing dirstate content in it.
addfilegenerator can be called multiple time, each call replacing the other.
related to point 3.3: You have to let the transaction handle the final
write too. The reason you have dirstate being handled by the transaction
logic (if present) is that you never want to write the dirstate to disk
(except pending) until the transaction data are visible too. And the
dirstate/guard object have no way to know when this will happen. The
transaction object is the one who know about that.
Lets looks at the following case:
with tr:
commit = newcommit()
with ds:
ds.setparent()
ds.close() # confirm the dirstate change
othercrap()
tr.close() # commit the transaction
In this case the "ds.close()" is not the right time to write the file
down, it is too early. It has to be written down at 'tr.close()' time.
By chance, the transaction will automatically take care of that since
you used `addfilegenerator`.
summary: your .close() must not touch disk in the in-transaction case.
Will handle it.
The final writing of the dirstate file will not be a simple move of the
'.pending' file. This is necessary because the pending are lazily
generated. Lets looks at the following scenario:
with tr:
commit = newcommit()
with ds:
ds.setparent()
ds.close() # confirm the move
tr.pending()
with ds:
ds.setparent()
ds.close() # confirm the move
tr.close() # commit the transaction
In this case, the .pending file is outdated compared to the in-memory
state. So we need to regenerate the file anyway.
summary: transaction won't use simple rename.
related to point 3.1: You also have to be very careful of not leaving
any '.pending' file around. Lets look at the following scenario.
$ hg commit # write dirstate.pending, leave it behind
$ hg up 42 # does not need to write a pending file
$ hg pull # transaction call hook on pending change, dirstate is read
# from .pending becaues it exists
So we have to make sure we do not leave the file behind. By chance, the
transaction will take care of that for you. It keep track of all
temporary file it generated and make sure they are cleaned up when the
transaction is closed or rollbacked.
(We should probably aggressively nuke all known '.pending; type file at
transaction open time but I'm not sure it is done yet.)
other nice thing: the transaction logic will also backup your file (and
restore in rollback case) your dirstate file as soon as your register a
generator for it.
Developer Warning:
-----------------
To work your code (will) make multiple assumption:
- wlock is alway taken when we touch the dirstate,
- write are always done in a dirstate guard context,
- we do not do any direct writing if a transaction exists,
It is easy to check this is true, but we crashing in these case is
suboptimal since it will affect human user of unsafe, but already
existing code. By chance we now have a "developer warning concept" (that
you get from `devel.all=true` config) and the default behavior for tests
is now to run them. So adding code to detect and issue warning in this
case will let the code base be cleaned up over time.
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list