[PATCH 1 of 2] strip: add a delayedstrip method that works in a transaction

Mon Jun 26 20:08:03 EDT 2017

On Mon, Jun 26, 2017 at 4:33 PM, Jun Wu <quark at fb.com> wrote:
> Excerpts from Martin von Zweigbergk's message of 2017-06-26 09:26:56 -0700:
>> [...]
>> > +def delayedstrip(ui, repo, nodelist, topic=None):
>> > +    """like strip, but works inside transaction and won't strip irreverent revs
>>
>> I think you meant s/irreverent/irrelevant/ :-) But to make it clearer,
>> maybe "descendants" is even better (assuming that's what it is about)?
>
> Oops. I will fix the spelling as long as update the docstring.
>
> The language is a bit tricker to write, it's not stripping descendants and
> it won't remove ancestors of those descendants. If nodelist contains A and B
> where B is a descendant of A, it's still possible to remove B, or A+B, or
> nothing.
>
> Maybe something like:
>
>   won't strip anything that cause nodes outside nodelist to disappear
>
>> > +
>> > +    nodelist must explicitly contain all descendants.
>>
>> That seems like a safer API in general. Would it be a lot of work to
>> move this into the normal strip() method?
>
> That is also what I have been considering. It seems to me that "strip()" is
> a lower level function and we might want a low level function any way.
>
> Maybe rename "strip" to "_strip" and make "strip" do the extra safe check?
> I'll do that if it's simple.

That's what I had been thinking. I just assumed you wanted to migrate
all users first and that that's the only reason you didn't do it right
away.

>
>> > Otherwise a warning will
>> > +    be printed that some nodes are not stripped.
>>
>> Maybe this should be a ProgrammingError?
>
> No. This is not an error intentionally. It makes the API easier to use and
> more flexible. For example,
>
>   1. When rebaseset has a "hole", rebase won't need special handling before
>      sending the rebaseset to delayedstrip.

I thought we decided that that that shouldn't happen. Aborting in that
case instead of making the rebase complete successfully seems fine to
me (maybe even better, I'm not sure).

>   2. Even if you know that nodelist covers all descendants at the time you
>      call "delayedstrip", because it's delayed, somebody else could still
>      change the repo, and at the end of transaction, the original nodelist
>      may have surprising new children and shouldn't be stripped.

That's the case I would like to catch by using ProgrammingError.

>
> I think supporting "2" is important. There is a test case for that (commit
> after delayedstrip).
>
>> > +    Always do a backup.
>>
>> Why? Is that more necessary for posthook strips than other strips?
>
> All the code needing to migrate to this code path does a backup.

Like I said above, I think I'd prefer delayedstrip() to subsume
strip() and take its name. So you will add the backup flag then
instead? Relatedly, stripcallback's backup option is currently
ignored.

>
>> >The last non-None "topic" will be used as the backup
>> > +    topic name. The default backup topic name is "backup".
>>
>> Why not use a separate backup for each call to delayedstrip()?
>
> For example, given the following simple DAG:
>
>     B D
>     | |
>     A C
>
> And somebody calls:
>
>     delayedstrip(nodelist=['A', 'D'], topic='AD')
>     delayedstrip(nodelist=['B', 'C'], topic='BC')
>
> What do you think is the best behavior with that?

Good point. The topic could also be the concatenation of the topics
(maybe joined by a '+' or something). Picking the last seemed
arbitrary. Anyway, not very important.

>
>> > +    # transaction postclose callbacks are called in alphabet order.
>> > +    # use '\xff' as prefix so we are likely to be called last.
>>
>> Why do we want them to run a late as possible? Do we think other hooks
>> would want the to-be-stripped changesets to still be there when they
>> run?
>
> Just to be safe in case some other postclose callbacks do crazy things (like
> os.system("hg commit -m X")). They shouldn't but there is no easy way to
> detect what other postclose callbacks will do, so I'd prefer safer by
> default.

I agree that it makes sense to run it last *or* first (otherwise some
hooks would see the commits and some wouldn't). I was wondering if
first would make as much sense. It's unclear to me. Update: I saw you
answered this :-)