[PATCH 1 of 9 V3] cmdutil: add the class to restore dirstate at unexpected failure easily
pierre-yves.david at ens-lyon.org
Fri May 8 21:00:02 CDT 2015
On 05/06/2015 08:15 PM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
> # Date 1430968030 -32400
> # Thu May 07 12:07:10 2015 +0900
> # Node ID 449a46109f0c96bb8d82d4edf2c8855341558c2f
> # Parent 8174d27576a3fff28fb6f952a4c89d5c0b900214
> cmdutil: add the class to restore dirstate at unexpected failure easily
> Before this patch, after "dirstate.write()" execution, there is no way
> to restore dirstate to the original status before "dirstate.write()".
> In some code paths, "dirstate.invalidate()" is used as a kind of
> "restore .hg/dirstate to the original status". But it just avoids
> writing changes in memory out, and doesn't actually restore
> ".hg/dirstate" file. "dirstate.write()" prevents it from working as
> To fix the issue that recent (in memory) dirstate isn't visible to
> external process (e.g. "precommit" hooks), "dirstate.write()" should
> be invoked before invocation of external process. But at the same
> time, ".hg/dirstate" should be restored to the status before
> "dirstate.write()" at unexpected failure in some cases.
> This patch adds the class "dirstateguard" to easily restore
> ".hg/dirstate" at unexpected failure. Typical usecase of it is:
> # (1) build dirstate up
> # (2) write dirstate out, and backup ".hg/dirstate"
> dsguard = dirstateguard(repo, 'scopename')
> # (3) execute somethig to do:
> # this may imply making some additional changes on dirstate
> # (4) unlink backup-ed dirstate file at the end of dsguard scope
> # (5) if execution is aborted before "dsguard.close()",
> # ".hg/dirstate" is restored from the backup
This "invocation" before external process, sounded really like the
"pending mechanism" used in transaction. Things are a bit different
since the dirstate also refers to content of the working copy, that have
been already flushed to disk. But as far as I understand, the dirstate
refers to parents that may be only contained in the the transaction.
The real question here is: What should external writer see during the
If it make more sense for external writer to be able to have a peek at
the dirstate, we should write it.
If external writer cannot understand what is in the dirstate data, it
should not be written to disk until the transaction is completed, and
using a `.pending` mechanism instead.
> For this kind of issue, "extending transaction" approach (in
> https://titanpad.com/mercurial32-sprint) seems not to be suitable,
> - transaction nesting occurs in some cases (e.g. "shelve => rebase"), and
> - "dirstate" may be already modified since the beginning of OUTER
> transaction scope, then
> - dirstate should be backed up into the file other than
> "dirstate.journal" at the beginning of INNER transaction scope, but
> - such alternative backup files are useless for transaction itself,
> and increases complication of its implementation
Some part about the transaction confuses me.
- Yes, you are right, the dirstate changes may have a wider life spawn
than the transaction.
- Therefor doing backup there is not suitable (and is maybe never
useful) (your are still right there)
- However, for any changes started inside a transaction, we should
never write data in .hg/dirstate until the transaction is actually
committed (because it likely refers to unwritten data in the
transaction). This means that hooks/editors/etc would need to access a
.hg/distate.pending in that case.
- The transaction API allow for both "wait until transaction is closed
to flush that file" logic and "please write pending data for hooks"
logic. It looks like the dirstate handling should collaborate with the
transaction if one exist when dirstate manipulation are done. I'm not
sure how much it happen.
Another things that concerns me is "re-entrance", you code use a
distinct object for each scope, using its own backup and everything.
This seems simple enough, provide nesting and work probably well.
However your backup is based on the "name" provided in the code. This
mean that two nested calls with the same "name" will overwrite each
other. We should throw some unique identifier in there. `id(self)` would do.
This re-entrance business transition perfectly to my last question: It
seems like this superseed the beginparentchange/endparentchange system
previously in place. However, there is still multiple call to this
system. Could we, in theory replace them will with the new system? If
Also, this old system was living on the dirstate object directly. would
it make sense to move the guardian business there?
Summary of questions/topics
- Do we need a pending mechanism ?
- It seems like it would make sense to attach the logic to the
transaction if one exists. Though?
- Possible "name" collision and necessity to make identifier unique.
- What's the fate of beginparentchange
Beside the question aboves (and some typo I can fix in-flight), the
series looks like an improvement to me. I'll take it once the above
point are clarified.
> +class dirstateguard(object):
> + '''Restore dirstate at unexpected failure.
> + At the construction, this class does:
> + - write current ``repo.dirstate`` out, and
> + - save ``.hg/dirstate`` into the backup file
> + This restores ``.hg/dirstate`` from backup file, if ``release()``
> + is invoked before ``close()``.
> + This just removes the backup file at ``close()`` before ``release()``.
> + '''
> + def __init__(self, repo, name):
> + repo.dirstate.write()
> + self.repo = repo
> + self.name = 'dirstate.backup.%s' % name
this "name" is actually a filename, would we rename that into 'filename'?
(could be a follow up)
> + repo.vfs.write(self.name, repo.vfs.tryread('dirstate'))
> + self.active = True
> + self.closed = False
also: Could we have all the attributes marked private as private
(could be a follow up)
More information about the Mercurial-devel