[PATCH 1 of 2] strip: add a delayedstrip method that works in a transaction

Martin von Zweigbergk martinvonz at google.com
Tue Jun 27 00:33:50 EDT 2017

On Mon, Jun 26, 2017 at 5:33 PM, Jun Wu <quark at fb.com> wrote:
> Excerpts from Martin von Zweigbergk's message of 2017-06-26 17:08:03 -0700:
>> >> > Otherwise a warning will
>> >> > +    be printed that some nodes are not stripped.
>> >>
>> >> Maybe this should be a ProgrammingError?
>> >
>> > No. This is not an error intentionally. It makes the API easier to use and
>> > more flexible. For example,
>> >
>> >   1. When rebaseset has a "hole", rebase won't need special handling before
>> >      sending the rebaseset to delayedstrip.
>> I thought we decided that that that shouldn't happen. Aborting in that
>> case instead of making the rebase complete successfully seems fine to
>> me (maybe even better, I'm not sure).
> I think our conclusion is "it is rare", and "it's a best effort to support".

Yes, agreed. Sorry, I mean to clarify that, but I was typing it out
quickly before leaving work.

> If we abort, what do you think about the "hg rebase --continue" case in
> issue5606? Should we *enforce* the user to turn on evolve to continue?
> I don't think that is a good idea.

Yes, either turn on evolve or abort the rebase (aborting the rebase
was what I was thinking while writing the paragraph above, but
continuing with evolve on should also be a solution). This is such a
corner case that I don't care much.

> Also, I'd like to be able to reuse the cleanup API in "absorb". If we raise
> ProgrammingError, it will break a valid use-case of a Mozilla "hg absorb"
> developer [1].
> [1]: https://bitbucket.org/facebook/hg-experimental/issues/6/hg-absorb-merges-diverged-commits

I spent a few minutes, but I couldn't understand how it would break.

>> >   2. Even if you know that nodelist covers all descendants at the time you
>> >      call "delayedstrip", because it's delayed, somebody else could still
>> >      change the repo, and at the end of transaction, the original nodelist
>> >      may have surprising new children and shouldn't be stripped.
>> That's the case I would like to catch by using ProgrammingError.
> If it's ProgrammingError, which programmer made errors? The commit one or
> the strip one, or the one putting them together?
> For the commit one or the rebase one, they should not be aware of what could
> happen in the rest of the same transaction - they are just making reusable
> sub-routines. So it can only be the one putting them together.
> Suppose I write the code:
>     with repo.transaction():
>         rebase(....)
>         edit-repo(....)
>         edit-repo2(....)
> In order to avoid ProgrammingError, I must know the implementation details
> of "rebase", "edit-repo", "edit-repo2" to do the check, or access their
> internal states. That seems like a layer violation to me. And is probably
> also fragile - even if I get all the new and stripped nodes from all
> subroutines, writing correct code to avoid the ProgrammingError is still
> non-trivial. Definitely not friendly to new developers.

I'm not sure, but it's probably the person writing the "hg
rebase-and-commit" command made the mistake. To me, just looking at
the result from the user's perspective makes it clearly wrong. If the
rebasing leaves both an old version and a new version in the repo, I
would definitely say that there's a bug in the command.

While writing this, I saw your other reply about users also producing
commits on top. That's an interesting case, and I'm glad we have a
test case for it. However, I don't see how that case will actually
trigger the ProgrammingError I'm suggesting. The ProgrammingError
would happen in the postclose hook, which happens after the
transaction close, but still before the repo lock is released, right?
If that's the case, I don't see how the user could create a commit in
the meantime.

>> > I think supporting "2" is important. There is a test case for that (commit
>> > after delayedstrip).
>> >
>> >> > +    Always do a backup.
>> >>
>> >> Why? Is that more necessary for posthook strips than other strips?
>> >
>> > All the code needing to migrate to this code path does a backup.
>> Like I said above, I think I'd prefer delayedstrip() to subsume
>> strip() and take its name. So you will add the backup flag then
>> instead? Relatedly, stripcallback's backup option is currently
>> ignored.
>> >
>> >> >The last non-None "topic" will be used as the backup
>> >> > +    topic name. The default backup topic name is "backup".
>> >>
>> >> Why not use a separate backup for each call to delayedstrip()?
>> >
>> > For example, given the following simple DAG:
>> >
>> >     B D
>> >     | |
>> >     A C
>> >
>> > And somebody calls:
>> >
>> >     delayedstrip(nodelist=['A', 'D'], topic='AD')
>> >     delayedstrip(nodelist=['B', 'C'], topic='BC')
>> >
>> > What do you think is the best behavior with that?
>> Good point. The topic could also be the concatenation of the topics
>> (maybe joined by a '+' or something). Picking the last seemed
>> arbitrary. Anyway, not very important.
> Practically, the current design allows people to reuse functions that calls
> "delayedstrip" and change its topic name. For example:
>     with transaction('split'):
>         ...
>         rebase(...) # calls delayedstrip(topic='rebase')
>         delayedstrip(topic='split')
> I think it's cleaner if backup name is "x-split.hg" instead of
> "x-rebase+split.hg". But I don't have strong feeling.
> Handling backup=True/False will make the API more complex than desired. I
> don't really have a clean way to deal with that. Take the above example, it
> could also be:
>     delayedstrip(nodelist=['A', 'D'], backup=True)
>     delayedstrip(nodelist=['B', 'C'], backup=False)

I saw your other email about finding a way to make that work. Even if
you didn't, the case seems so obscure that I wouldn't mind losing
support for it.

Let me know if it's not actually as obscure as it seems; i.e. I'd be
curious to hear of a useful command that might produce that.

> Therefore I don't think "delayedstrip" should replace "strip", the word
> "delayed" is there for a reason. But I agree it's nicer if "strip" does the
> orphan check. So the change will look like:
>     def strip(....):
>         roots = safestriproots(nodelist)
>         _strip(roots)
>     def _strip(....):
>         # the original "strip"
>> [...]
>> I agree that it makes sense to run it last *or* first (otherwise some
>> hooks would see the commits and some wouldn't). I was wondering if
>> first would make as much sense. It's unclear to me. Update: I saw you
>> answered this :-)

More information about the Mercurial-devel mailing list