[PATCH 1 of 9 V3] cmdutil: add the class to restore dirstate at unexpected failure easily

Fri May 8 21:00:02 CDT 2015

On 05/06/2015 08:15 PM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1430968030 -32400
> #      Thu May 07 12:07:10 2015 +0900
> # Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
> # Parent  8174d27576a3fff28fb6f952a4c89d5c0b900214
> cmdutil: add the class to restore dirstate at unexpected failure easily
>
> Before this patch, after "dirstate.write()" execution, there is no way
> to restore dirstate to the original status before "dirstate.write()".
>
> In some code paths, "dirstate.invalidate()" is used as a kind of
> "restore .hg/dirstate to the original status". But it just avoids
> writing changes in memory out, and doesn't actually restore
> ".hg/dirstate" file. "dirstate.write()" prevents it from working as
> expected.
>
> To fix the issue that recent (in memory) dirstate isn't visible to
> external process (e.g. "precommit" hooks), "dirstate.write()" should
> be invoked before invocation of external process. But at the same
> time, ".hg/dirstate" should be restored to the status before
> "dirstate.write()" at unexpected failure in some cases.
>
> This patch adds the class "dirstateguard" to easily restore
> ".hg/dirstate" at unexpected failure. Typical usecase of it is:
>
>      # (1) build dirstate up
>      ....
>
>      # (2) write dirstate out, and backup ".hg/dirstate"
>      dsguard = dirstateguard(repo, 'scopename')
>      try:
>          # (3) execute somethig to do:
>          #     this may imply making some additional changes on dirstate
>          ....
>
>          # (4) unlink backup-ed dirstate file at the end of dsguard scope
>          dsguard.close()
>      finally:
>          # (5) if execution is aborted before "dsguard.close()",
>          #     ".hg/dirstate" is restored from the backup
>          dsguard.release()

This "invocation" before external process, sounded really like the 
"pending mechanism" used in transaction. Things are a bit different 
since the dirstate also refers to content of the working copy, that have 
been already flushed to disk. But as far as I understand, the dirstate 
refers to parents that may be only contained in the the transaction.

The real question here is: What should external writer see during the 
process.

If it make more sense for external writer to be able to have a peek at 
the dirstate, we should write it.
If external writer cannot understand what is in the dirstate data, it 
should not be written to disk until the transaction is completed, and 
using a `.pending` mechanism instead.

> For this kind of issue, "extending transaction" approach (in
> https://titanpad.com/mercurial32-sprint) seems not to be suitable,
> because:
>
>    - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and
>
>    - "dirstate" may be already modified since the beginning of OUTER
>      transaction scope, then
>
>    - dirstate should be backed up into the file other than
>      "dirstate.journal" at the beginning of INNER transaction scope, but
>
>    - such alternative backup files are useless for transaction itself,
>      and increases complication of its implementation

Some part about the transaction confuses me.

  - Yes, you are right, the dirstate changes may have a wider life spawn 
than the transaction.

  - Therefor doing backup there is not suitable (and is maybe never 
useful) (your are still right there)

  - However, for any changes started inside a transaction, we should 
never write data in .hg/dirstate until the transaction is actually 
committed (because it likely refers to unwritten data in the 
transaction). This means that hooks/editors/etc would need to access a 
.hg/distate.pending in that case.

  - The transaction API allow for both "wait until transaction is closed 
to flush that file" logic and "please write pending data for hooks" 
logic. It looks like the dirstate handling should collaborate with the 
transaction if one exist when dirstate manipulation are done. I'm not 
sure how much it happen.

Another things that concerns me is "re-entrance", you code use a 
distinct object for each scope, using its own backup and everything.
This seems simple enough, provide nesting and work probably well. 
However your backup is based on the "name" provided in the code. This 
mean that two nested calls with the same "name" will overwrite each 
other. We should throw some unique identifier in there. `id(self)` would do.

This re-entrance business transition perfectly to my last question: It 
seems like this superseed the beginparentchange/endparentchange system 
previously in place. However, there is still multiple call to this 
system. Could we, in theory replace them will with the new system? If 
not, why?

Also, this old system was living on the dirstate object directly. would 
it make sense to move the guardian business there?

Summary of questions/topics
===========================

- Do we need a pending mechanism ?
- It seems like it would make sense to attach the logic to the 
transaction if one exists. Though?
- Possible "name" collision and necessity to make identifier unique.
- What's the fate of beginparentchange

Beside the question aboves (and some typo I can fix in-flight), the 
series looks like an improvement to me. I'll take it once the above 
point are clarified.

> +class dirstateguard(object):
> +    '''Restore dirstate at unexpected failure.
> +
> +    At the construction, this class does:
> +
> +    - write current ``repo.dirstate`` out, and
> +    - save ``.hg/dirstate`` into the backup file
> +
> +    This restores ``.hg/dirstate`` from backup file, if ``release()``
> +    is invoked before ``close()``.
> +
> +    This just removes the backup file at ``close()`` before ``release()``.
> +    '''
> +
> +    def __init__(self, repo, name):
> +        repo.dirstate.write()
> +        self.repo = repo
> +        self.name = 'dirstate.backup.%s' % name

this "name" is actually a filename, would we rename that into 'filename'?
(could be a follow up)

> +        repo.vfs.write(self.name, repo.vfs.tryread('dirstate'))
> +        self.active = True
> +        self.closed = False

also: Could we have all the attributes marked private as private
(could be a follow up)

-- 
Pierre-Yves David