[PATCH 2 of 3] bundle: add config option to include phases

Mon Jun 19 13:54:26 EDT 2017

On Mon, Jun 19, 2017 at 9:06 AM, Pierre-Yves David
<pierre-yves.david at ens-lyon.org> wrote:
>
>
> On 06/18/2017 07:12 AM, Martin von Zweigbergk wrote:
>>
>> On Sat, Jun 17, 2017 at 4:46 PM, Pierre-Yves David
>> <pierre-yves.david at ens-lyon.org> wrote:
>>>
>>>
>>>
>>> On 06/17/2017 07:19 AM, Martin von Zweigbergk via Mercurial-devel wrote:
>>>>
>>>>
>>>> # HG changeset patch
>>>> # User Martin von Zweigbergk <martinvonz at google.com>
>>>> # Date 1497460519 25200
>>>> #      Wed Jun 14 10:15:19 2017 -0700
>>>> # Node ID 792544200a5a7e21bcc8646604d85b62db277d79
>>>> # Parent  123eb8b3b913585cc0a29c14c232ab6fc78978b3
>>>> bundle: add config option to include phases
>>>>
>>>> This adds an experimental.bundle-phases config option to include phase
>>>> information in bundles. As with the recently added support for
>>>> bundling obsmarkers, the support for bundling phases is hidden behind
>>>> the config option until we decide to make a bundle format v3 that
>>>> includes phases (and obsmarkers and ...).
>>>>
>>>> We cannot use the same listkeys bundle part as is used on the wire,
>>>> because we want different semantics. Specifically, we never want the
>>>> unbundled phases to override the phase of an existing changeset. This
>>>> is different from exchange where a phase advancement on the remote
>>>> will propagate to the local repo. For the non-exchange usecase, we
>>>> instead want to be able to bundle a draft changeset, change its parent
>>>> to secret and not have the unbundling reset the parent's phase back to
>>>> draft. For this reason, we use a new bundle part instead. The new
>>>> bundle part contains the phase roots that are internal to the set of
>>>> bundled revisions. It also contains the phase for each root of the set
>>>> of bundled revisions.
>>>
>>>
>>>
>>> `hg bundle` primary goal and usage is to allow exchange of data between
>>> repositories (according usage[1] and doc[2][3]). So moving away from the
>>> standard phase semantic seems problematic. I think we should stick to
>>> that
>>> usual behavior of "move to lowest phase seen".
>>
>>
>> I did note while working on this that it this was the first bundle
>> part that was not designed for exchange. So I did notice that it was
>> unusual in that way, but I didn't see anything problematic about it.
>> But let's put that discussion on hold until we have consensus on the
>> right user-visible behavior for this. If the consensus is that
>> unbundling should be allowed to affect the phase of changesets already
>> in the repo, I'll happily switch to using the listkeys part.
>
>
> Now that you point it out, I'm not sure why we are skipping push/pull usage
> for the new parts? We currently use pushkey-over-bundle2 to exchange phases,
> the pushkey protocol is antic and not very adequate. This new part should be
> a good opportunity to fix that.

As I said in the commit message, I wanted different semantics for the
phases part. However, I think I'm convinced by your argument about
making exchange via on-disk bundles match exchange via the wire
protocol. Once I've changed the semantics of that, I agree that we
should be able to reuse more of the code between these two code paths.
I'd still like to get the the new part in first, because this has been
blocking a feature for us internally at Google for a long time.

>
> For example: one of the problem with pushkey is that it list all remote
> phase roots on pull. Not just the relevant one. In my Mercurial repository
> this means sending about 32K of useless data over the wire with each pull.
>
>>> In addition this semantic will provide challenging consistency. The
>>> unbundle
>>> behavior can see large changes implied by small "unrelated" details. Here
>>> is
>>> a quick example (I have others):
>>>
>>>  * bundle X contains changeset (A,B,C) all public.
>>>  * Alice repo contains (A, B, C) all draft
>>>  * Bob   repo contains (A, B) all draft
>>>
>>> With the proposed semantic running `hg unbundle X`:
>>>
>>>  * in Alice repository: no changes,
>>> * in Bob repository: C is added (as public), all of A,B,C is marked
>>> public.
>>
>>
>> No, the proposed semantics is to leave the phase on existing
>> changesets unchanged ("Specifically, we never want the unbundled
>> phases to override the phase of an existing changeset."). This is
>> covered by the test called "Phase of parent of stripped node doesn't
>> get decreased to accomodate child; instead, the child adapts by
>> increasing its phase".
>
>
> Ha, sorry for processing that part a bit too fast. That is more consistent
> in itself :-).
>
> However this still violates the core behavior and definition of phases. If a
> "public" changeset is sent in a bundle, increasing its phase to "draft" on
> unbundling is illegal[1]. Public phase is an important security feature, we
> do not want to be less consistent in enforcing it.
>
> [1]
> https://www.mercurial-scm.org/repo/hg/file/a3a36bcf122e/mercurial/help/phases.txt#l16
>
> This change in phase behavior also introduces inconsistency with other
> bundle related data: "obsolescence markers". The bundle contains markers
> relevant to the bundled changesets, and these get added to the repository in
> all cases (changesets known or not).
>
> Going further down that road, the exchange of obsolescence markers without
> propagation of the phases information will produce bad result in the case of
> obsmarkers affecting (should-be) public changesets.
>
>
> Use-cases-wise, I just had one Yesterday. Boris and I were working in a
> plane (wifi banned) and we exchanged changesets with bundle over an USB
> dongle. Changeset and obsmarkers information came along, but we had to
> synchronize phases by hand (annoying and error prone).

Ah, that last paragraph contains a great point about making the
exchanges behave similarly between on-disk and on-wire bundles. I
agree that should be a stronger argument than preserving phases within
a repo across "bundle -> update phase -> unbundle". I'll change that
for v2.

>
>
>>> I'm not sure what is the actual usecase for "bundle; backward phase
>>> change;
>>> unbundle" do you have anything in mind? The "draft" → "secret" movement
>>> is
>>> not natural and mercurial does not provide clean ways to do it (requires
>>> --force) one should not expect it to be preserved.
>>
>>
>> I have no actual usecase in mind. It's more a consequence of the
>> principle stated above about not affecting the phase of a changeset
>> already in the repo.
>>
>>>
>>> I could keep going for a while, but Martin requested a short feedback ;-)
>>
>>
>> Thanks!
>>
>>>
>>> [1] default to compare with another repositories:
>>>
>>>
>>> https://www.mercurial-scm.org/repo/hg/file/29558247b00e/mercurial/commands.py#l1247
>>> [2]
>>>
>>> https://www.mercurial-scm.org/repo/hg/file/29558247b00e/mercurial/commands.py#l1250
>>> [3]
>>>
>>> https://www.mercurial-scm.org/repo/hg/file/29558247b00e/mercurial/commands.py#l1263
>>>
>>>
>>>> For now, phases are only included by "hg bundle", and not by
>>>> e.g. strip and rebase.
>>>>
>>>> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
>>>> --- a/mercurial/bundle2.py
>>>> +++ b/mercurial/bundle2.py
>>>> @@ -158,6 +158,7 @@
>>>>      changegroup,
>>>>      error,
>>>>      obsolete,
>>>> +    phases,
>>>>      pushkey,
>>>>      pycompat,
>>>>      tags,
>>>> @@ -165,6 +166,11 @@
>>>>      util,
>>>>  )
>>>>
>>>> +from .node import (
>>>> +    bin,
>>>> +    hex,
>>>> +)
>>>> +
>>>>  urlerr = util.urlerr
>>>>  urlreq = util.urlreq
>>>>
>>>> @@ -1386,6 +1392,13 @@
>>>>          obsmarkers = repo.obsstore.relevantmarkers(outgoing.missing)
>>>>          buildobsmarkerspart(bundler, obsmarkers)
>>>>
>>>> +    if opts.get('phases', False):
>>>> +        keys = []
>>>> +        rootphases = phases.subsetphases(repo, outgoing.missing)
>>>> +        for root, phase in rootphases.iteritems():
>>>> +            keys.append((hex(root), '%d' % phase))
>>>> +        bundler.newpart('phases', data=pushkey.encodekeys(keys))
>>>
>>>
>>>
>>> Small note: I would prefer to not see the pushkey encoding leaks here.
>>> the
>>> pushkey encoding is text based and requires escaping as soon as the data
>>> becomes a bit complicated. Since the rest of bundle2 is using binary data
>>> I
>>> would prefer to see binary used here too. Bundle parts are exposed over
>>> the
>>> wires so we have a strong BC on them. As soon as release a version using
>>> pushey, we'll have to maintain it forever.
>>
>>
>> As you noted above, these bundles will not actually be sent over the
>> wire. However, they can of course be stored and shared outside of the
>> hg wire protocol, so compatibility is still required once this moves
>> out of experimental.
>
>
> Yes, it is going on disk and is readable by other mercurial process so it
> get the same compatibility constraint that the other ones. (It seems like we
> agree here).
>
> If you make your current series available somewhere, I'm ready to offer you
> a small patch switching the encoding to binary to fold in.

Thanks, I'll let you know if I don't change it myself.

>
>> Speaking of that, I guess I should add "version" part parameter in
>> case we ever need to change the format.
>
>
> You don't really needs too. The lack of "version" parameters indicate the
> first version. If we needs to make a new version, we can add a mandatory
> parameters that older client will choke on.

Good point.

>
>>> (note: having a generic utility to encode/decode mapping in bundle part
>>> would be useful to more than phase)
>>>
>>>  […]
>
>
> --
> Pierre-Yves David