[PATCH 1 of 9 V3] cmdutil: add the class to restore dirstate at unexpected failure easily

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Mon May 11 10:49:56 CDT 2015


At Fri, 08 May 2015 19:00:02 -0700,
Pierre-Yves David wrote:
> 
> 
> 
> On 05/06/2015 08:15 PM, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> > # Date 1430968030 -32400
> > #      Thu May 07 12:07:10 2015 +0900
> > # Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
> > # Parent  8174d27576a3fff28fb6f952a4c89d5c0b900214
> > cmdutil: add the class to restore dirstate at unexpected failure easily
> >
> > Before this patch, after "dirstate.write()" execution, there is no way
> > to restore dirstate to the original status before "dirstate.write()".
> >
> > In some code paths, "dirstate.invalidate()" is used as a kind of
> > "restore .hg/dirstate to the original status". But it just avoids
> > writing changes in memory out, and doesn't actually restore
> > ".hg/dirstate" file. "dirstate.write()" prevents it from working as
> > expected.
> >
> > To fix the issue that recent (in memory) dirstate isn't visible to
> > external process (e.g. "precommit" hooks), "dirstate.write()" should
> > be invoked before invocation of external process. But at the same
> > time, ".hg/dirstate" should be restored to the status before
> > "dirstate.write()" at unexpected failure in some cases.
> >
> > This patch adds the class "dirstateguard" to easily restore
> > ".hg/dirstate" at unexpected failure. Typical usecase of it is:
> >
> >      # (1) build dirstate up
> >      ....
> >
> >      # (2) write dirstate out, and backup ".hg/dirstate"
> >      dsguard = dirstateguard(repo, 'scopename')
> >      try:
> >          # (3) execute somethig to do:
> >          #     this may imply making some additional changes on dirstate
> >          ....
> >
> >          # (4) unlink backup-ed dirstate file at the end of dsguard scope
> >          dsguard.close()
> >      finally:
> >          # (5) if execution is aborted before "dsguard.close()",
> >          #     ".hg/dirstate" is restored from the backup
> >          dsguard.release()
> 
> This "invocation" before external process, sounded really like the 
> "pending mechanism" used in transaction. Things are a bit different 
> since the dirstate also refers to content of the working copy, that have 
> been already flushed to disk. But as far as I understand, the dirstate 
> refers to parents that may be only contained in the the transaction.
> 
> The real question here is: What should external writer see during the 
> process.
> 
> If it make more sense for external writer to be able to have a peek at 
> the dirstate, we should write it.
> If external writer cannot understand what is in the dirstate data, it 
> should not be written to disk until the transaction is completed, and 
> using a `.pending` mechanism instead.
>
>
> > For this kind of issue, "extending transaction" approach (in
> > https://titanpad.com/mercurial32-sprint) seems not to be suitable,
> > because:
> >
> >    - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and
> >
> >    - "dirstate" may be already modified since the beginning of OUTER
> >      transaction scope, then
> >
> >    - dirstate should be backed up into the file other than
> >      "dirstate.journal" at the beginning of INNER transaction scope, but
> >
> >    - such alternative backup files are useless for transaction itself,
> >      and increases complication of its implementation
> 
> Some part about the transaction confuses me.
> 
>   - Yes, you are right, the dirstate changes may have a wider life spawn 
> than the transaction.
> 
>   - Therefor doing backup there is not suitable (and is maybe never 
> useful) (your are still right there)
> 
>   - However, for any changes started inside a transaction, we should 
> never write data in .hg/dirstate until the transaction is actually 
> committed (because it likely refers to unwritten data in the 
> transaction). This means that hooks/editors/etc would need to access a 
> .hg/distate.pending in that case.
> 
>   - The transaction API allow for both "wait until transaction is closed 
> to flush that file" logic and "please write pending data for hooks" 
> logic. It looks like the dirstate handling should collaborate with the 
> transaction if one exist when dirstate manipulation are done. I'm not 
> sure how much it happen.

Committing via "repo.commit()" may cause invoking editor process out
of transaction scope, but flushing dirstate is still needed: for
example, "hg commit -A" changes status of each unknown files before
"repo.commit()".

`.pending` mechanism of transaction itself can't be used directly in
such cases.

But on the other hand, editor process may be invoked in transaction
scope, and (maybe) pending changelog and so on should become visible
for external process: for example, "hg import -e" with multiple
patches should cause such situation (BTW, rebasing multiple revisions
creates a transaction for each revisions to be rebased).

So, I'll try to follow the style of `.pending` mechanism of
transaction for dirstate.



> Another things that concerns me is "re-entrance", you code use a 
> distinct object for each scope, using its own backup and everything.
> This seems simple enough, provide nesting and work probably well. 
> However your backup is based on the "name" provided in the code. This 
> mean that two nested calls with the same "name" will overwrite each 
> other. We should throw some unique identifier in there. `id(self)` would do.

I don't have any reasons to insist using "name" for identifier. I'll
revise around it. BTW, what about using both "name" and `id(self)` for
trouble shooting/debugging ?


> This re-entrance business transition perfectly to my last question: It 
> seems like this superseed the beginparentchange/endparentchange system 
> previously in place. However, there is still multiple call to this 
> system. Could we, in theory replace them will with the new system? If 
> not, why?

begin/end-parentchange system seems handy for small (a few lines)
scope changes, because it doesn't need writing current dirstate out
and backing it up.

For example, current "heavy" dirstateguard doesn't seem suitable for
"committablectx.markcommitted()".

    def markcommitted(self, node):
        self._repo.dirstate.beginparentchange()
        for f in self.modified() + self.added():
            self._repo.dirstate.normal(f)
        for f in self.removed():
            self._repo.dirstate.drop(f)
        self._repo.dirstate.setparents(node)
        self._repo.dirstate.endparentchange()

For such lightweight use, we may have to add the flag to
enable/disable "writing current dirstate out and backing it up" to
dirstateguard: when backup is disabled, "dirstate.write()" may have to
be blocked for safety.

On the other hand, begin/end-parentchange system doesn't seem suitable
for non-small scope or complicated changes, because it is difficult to
ensure that "dirstate.write()" isn't implied in such case (like ones
in mq, commands.import and cmdutil.tryimportone replaced by this
series).

begin/end-parentchange in this condition should be replaced by
dirstateguard or so, as soon as possible.


> Also, this old system was living on the dirstate object directly. would 
> it make sense to move the guardian business there?

IMHO, yes, even though some more preparations may be needed to
complete moving.


> Summary of questions/topics
> ===========================
> 
> - Do we need a pending mechanism ?
> - It seems like it would make sense to attach the logic to the 
> transaction if one exists. Though?
> - Possible "name" collision and necessity to make identifier unique.
> - What's the fate of beginparentchange
> 
> Beside the question aboves (and some typo I can fix in-flight), the 
> series looks like an improvement to me. I'll take it once the above 
> point are clarified.
> 
> > +class dirstateguard(object):
> > +    '''Restore dirstate at unexpected failure.
> > +
> > +    At the construction, this class does:
> > +
> > +    - write current ``repo.dirstate`` out, and
> > +    - save ``.hg/dirstate`` into the backup file
> > +
> > +    This restores ``.hg/dirstate`` from backup file, if ``release()``
> > +    is invoked before ``close()``.
> > +
> > +    This just removes the backup file at ``close()`` before ``release()``.
> > +    '''
> > +
> > +    def __init__(self, repo, name):
> > +        repo.dirstate.write()
> > +        self.repo = repo
> > +        self.name = 'dirstate.backup.%s' % name
> 
> this "name" is actually a filename, would we rename that into 'filename'?
> (could be a follow up)

I'll revise so.

> > +        repo.vfs.write(self.name, repo.vfs.tryread('dirstate'))
> > +        self.active = True
> > +        self.closed = False
> 
> also: Could we have all the attributes marked private as private
> (could be a follow up)

I'll revise so.

 
> -- 
> Pierre-Yves David
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list