[PATCH 2 of 3 v2] bundle: add config option to include phases

Thu Jun 22 13:52:31 EDT 2017

On Thu, Jun 22, 2017 at 8:51 AM, Pierre-Yves David
<pierre-yves.david at ens-lyon.org> wrote:
>
>
> On 06/20/2017 07:00 PM, Martin von Zweigbergk via Mercurial-devel wrote:
>>
>> # HG changeset patch
>> # User Martin von Zweigbergk <martinvonz at google.com>
>> # Date 1497939455 25200
>> #      Mon Jun 19 23:17:35 2017 -0700
>> # Node ID 04fa2d783f067765f9d8cf0d1b02fae1819c232e
>> # Parent  0f77eccc602140bd4f8066a196b8ae53c0f50ef9
>> bundle: add config option to include phases
>>
>> This adds an experimental.bundle-phases config option to include phase
>> information in bundles. As with the recently added support for
>> bundling obsmarkers, the support for bundling phases is hidden behind
>> the config option until we decide to make a bundle format v3 that
>> includes phases (and obsmarkers and ...).
>
>
> TL;DR; I've found a bug in the handling non-bundled descendant of the
> bundled changeset. The computation around this part also seems more heavy
> than they should be. With a worrisome use of 'descendant' call.
> I recommend dropping this series and looking at the approach summarized at
> the end of this email.
>
> At the risk of being a bit pedantic: "bundle format v3" should be
> "bundlespec v3" to be less confusing. There will be no change of the binary
> bundle format, just change to what we decide the include in this content.
>
> (not sure if it is worth fixing)

Sure, will do.

>
>
>> We could perhaps use the listkeys format for this, but that's
>> considered obsolete according to Pierre-Yves. Instead, we introduce a
>> new "phases" bundle part. The new part contains the phase roots that
>> are internal to the set of bundled revisions. It also contains the
>> phase for each root of the set of bundled revisions.
>>
>> For now, phases are only included by "hg bundle", and not by
>> e.g. strip and rebase.
>>
>> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
>> --- a/mercurial/bundle2.py
>> +++ b/mercurial/bundle2.py
>> @@ -158,6 +158,7 @@
>>       changegroup,
>>       error,
>>       obsolete,
>> +    phases,
>>       pushkey,
>>       pycompat,
>>       tags,
>> @@ -178,6 +179,8 @@
>>   _fpayloadsize = '>i'
>>   _fpartparamcount = '>BB'
>>   +_fphasesentry = '>i20s'
>> +
>>   preferedchunksize = 4096
>>     _parttypeforbidden = re.compile('[^a-zA-Z0-9_:-]')
>> @@ -1387,6 +1390,13 @@
>>           obsmarkers = repo.obsstore.relevantmarkers(outgoing.missing)
>>           buildobsmarkerspart(bundler, obsmarkers)
>>   +    if opts.get('phases', False):
>> +        rootphases = phases.subsetphases(repo, outgoing.missing)
>> +        phasedata = []
>> +        for root, phase in rootphases.iteritems():
>> +            phasedata.append(_pack(_fphasesentry, phase, root))
>> +        bundler.newpart('phases', data=''.join(phasedata))
>> +
>
>
> In an "open" set of revision, phases heads is likely to provide a clear
> information and be simpler to apply. I think I've found of bug related to
> the usage of roots here, more on it at the 'phases.updatephases' level.

Yeah, I was also thinking that phase heads would be much easier to
deal with. I don't know why I never took the leap and tried it. Now I
have and it did turn out to be much easier. Thanks! (But I wish you
had said that on v1 of the series.)

>
> I recommend having a more verbose part name too (eg: 'phase-roots' or
> 'phase-heads')

Will do.

>
>>   def addparttagsfnodescache(repo, bundler, outgoing):
>>       # we include the tags fnode cache for the bundle changeset
>>       # (as an optional parts)
>> @@ -1721,6 +1731,25 @@
>>                   kwargs[key] = inpart.params[key]
>>           raise error.PushkeyFailed(partid=str(inpart.id), **kwargs)
>>   + at parthandler('phases')
>> +def handlephases(op, inpart):
>> +    """apply phases from bundle part to repo"""
>> +    phaseroots = {}
>> +    entrysize = struct.calcsize(_fphasesentry)
>> +    while True:
>> +        entry = inpart.read(entrysize)
>> +        if len(entry) < entrysize:
>> +            if entry:
>> +                op.ui.debug('ignoring incomplete phase entry%s\n' %
>> entry)
>
>
> Incomplete data are malformed input, why not just abort in this case? There
> are no reason a normal peer would send you malformed input.

True. Will do.

>
>
>> +            break
>> +        phase, node = struct.unpack(_fphasesentry, entry)
>> +        phaseroots[node] = phase
>> +    addednodes = []
>> +    for entry in op.records['changegroup']:
>> +        addednodes.extend(entry['addednodes'])
>> +    phases.updatephases(op.repo.unfiltered(), op.gettransaction(),
>> phaseroots,
>> +                        addednodes)
>> +
>>   @parthandler('reply:pushkey', ('return', 'in-reply-to'))
>>   def handlepushkeyreply(op, inpart):
>>       """retrieve the result of a pushkey request"""
>> diff --git a/mercurial/commands.py b/mercurial/commands.py
>> --- a/mercurial/commands.py
>> +++ b/mercurial/commands.py
>> @@ -1351,6 +1351,8 @@
>>       contentopts = {'cg.version': cgversion}
>>       if repo.ui.configbool('experimental', 'evolution.bundle-obsmarker',
>> False):
>>           contentopts['obsolescence'] = True
>> +    if repo.ui.configbool('experimental', 'bundle-phases', False):
>> +        contentopts['phases'] = True
>>       bundle2.writenewbundle(ui, repo, 'bundle', fname, bversion,
>> outgoing,
>>                              contentopts, compression=bcompression,
>>                              compopts=compopts)
>> diff --git a/mercurial/phases.py b/mercurial/phases.py
>> --- a/mercurial/phases.py
>> +++ b/mercurial/phases.py
>> @@ -430,6 +430,41 @@
>>           else:
>>               return False
>>   +def subsetphases(repo, subset):
>> +    """Finds the phase roots for a subset of a history
>> +
>> +    Returns a dict with a root as key and phase as value. The keys will
>> +    be the phases root within the nodes subset, plus the roots of the
>> subset.
>> +    """
>> +    phases = {}
>> +    for phase in trackedphases:
>> +        for root in repo._phasecache.phaseroots[phase]:
>> +            if root in subset:
>> +                phases[root] = phase
>
>
> Overall, sending heads (instead of roots) would makes various things
> simpler.
>
> Given the behavior of phases, processing "heads" on the destination would be
> a simple advanceboundary' call without further computation.

We would still need to retract the boundary to secret phase, because
changegroup.apply() currently sets the new changesets to draft by
default (I'm working on changing that, but I'm not done yet). If we
don't retract the secret boundary, we would not be able to restore
commits into secret phase.

> Computing the heads on the emitting side will also be more efficient since
> we already have will set of non-public revision pre-computed.

I'm doing something else that seems good enough. We can always improve
it after it necessary.

> Right now the receiving end has to make a lot of 'descendant' call that are
> inefficient.
>
> This server side computation also requires more information and is error
> prone (cf, bug pointed in updatephases)
>
>> +    for ctx in repo.set('roots(%ln)', subset):
>> +        phases[ctx.node()] = ctx.phase()
>
>
> note: the set of root in phasecache.phaseroots is already minimized so you
> should not need to run this extra ''roots(%ln)' call here.
>
>> +    return phases
>> +
>> +def updatephases(repo, tr, phaseroots, addednodes):
>> +    """Updates the repo with the given phase roots (node->phase dict)"""
>> +    cl = repo.changelog
>> +
>> +    # First make all the added revisions secret because
>> changegroup.apply()
>> +    # currently sets the phase to draft.
>> +    addedroots = [cl.node(rev) for rev in repo.revs('roots(%ln)',
>> addednodes)]
>> +    retractboundary(repo, tr, secret, addedroots)
>
>
> I think the above introduce a bug in case were bundle content as other
> descendant (secret being wrongfully turned to draft).. I'm not seeing the
> case tested so I'll assume it is buggy for now :-p.
>
> Practical example:
>
>   initial state:
>   A(public) - B(draft) - C(draft) - D (secret)
>
>   $ hg bundle --base A --rev B /tmp/babar.hg
>   bundle content:
>     phase-roots:
>       B draft
>
>   $ hg unbundle /tmp/babar.hg
>   algorithm step:
>     makes all added not secret → make B secret → make C secret
>         state: A(public) - B(secret) - C(secret) - D (secret)
>         (no public root, skip)
>         draft root (no secret roots)
>            compute('B:: - ()::') == compute('B::') = B,C,D
>         state: A(public) - B(draft) - C(draft) - D (draft)
>
> I recommend implementing this test case.
>
>> +    # Now advance phase boundaries of all but secret phase
>> +    nodesbyphase = [[] for i in allphases]
>> +    for node, phase in phaseroots.iteritems():
>> +        nodesbyphase[phase].append(node)
>> +    publicheads = repo.revs('heads((%ln::) - (%ln::))',
>> nodesbyphase[public],
>> +                             nodesbyphase[draft] + nodesbyphase[secret])
>> +    advanceboundary(repo, tr, public, [cl.node(rev) for rev in
>> publicheads])
>> +    draftheads = repo.revs('heads((%ln::) - (%ln::))',
>> nodesbyphase[draft],
>> +                             nodesbyphase[secret])
>> +    advanceboundary(repo, tr, draft, [cl.node(rev) for rev in
>> draftheads])
>> +
>>   def analyzeremotephases(repo, subset, roots):
>>       """Compute phases heads and root in a subset of node from root dict
>>   diff --git a/tests/test-bundle-phases.t b/tests/test-bundle-phases.t
>> new file mode 100644
>> --- /dev/null
>> +++ b/tests/test-bundle-phases.t
>> @@ -0,0 +1,148 @@
>> +  $ cat >> $HGRCPATH <<EOF
>> +  > usegeneraldelta=yes
>
>
> Why do we have to explicitly request experimental support here?

To enable bundle2. Let me know if there's a better way.

>
>> +  > [experimental]
>> +  > bundle-phases=yes
>> +  > [extensions]
>> +  > strip=
>> +  > drawdag=$TESTDIR/drawdag.py
>> +  > EOF
>> +
>> +Set up repo with linear history
>
>
> I recommend introducing some branching in the test (and merge) as this will
> impact some logic.

Will do.

>
> (I can elaborate if requested)
>
>
> So I would advocates for:
>
> - head-based-part (type: 'phase-heads')
> - computing the head on the emitting side (using 'immutable' branchmap +
> phase sets)
> - simple receiving bundle2 part only calling 'advanceboundary'
> - adding the missing test case (for the current bug)
> - adding test with branching and merging.
>
> (note: this would make patch 1 unnecessary)
>
>
>> +  $ hg init linear
>> +  $ cd linear
>> +  $ hg debugdrawdag <<'EOF'
>> +  > E
>> +  > |
>> +  > D
>> +  > |
>> +  > C
>> +  > |
>> +  > B
>> +  > |
>> +  > A
>> +  > EOF
>> +  $ hg phase --public A
>> +  $ hg phase --force --secret D
>> +  $ hg log -G -T '{desc} {phase}\n'
>> +  o  E secret
>> +  |
>> +  o  D secret
>> +  |
>> +  o  C draft
>> +  |
>> +  o  B draft
>> +  |
>> +  o  A public
>> +
>> +Phases are restored when unbundling
>> +  $ hg bundle --base B -r E bundle
>> +  3 changesets found
>> +  $ hg strip --no-backup C
>> +  $ hg unbundle -q bundle
>> +  $ rm bundle
>> +  $ hg log -G -T '{desc} {phase}\n'
>> +  o  E secret
>> +  |
>> +  o  D secret
>> +  |
>> +  o  C draft
>> +  |
>> +  o  B draft
>> +  |
>> +  o  A public
>> +
>> +Root revision's phase is preserved
>> +  $ hg bundle -a bundle
>> +  5 changesets found
>> +  $ hg strip --no-backup A
>> +  $ hg unbundle -q bundle
>> +  $ rm bundle
>> +  $ hg log -G -T '{desc} {phase}\n'
>> +  o  E secret
>> +  |
>> +  o  D secret
>> +  |
>> +  o  C draft
>> +  |
>> +  o  B draft
>> +  |
>> +  o  A public
>> +
>> +Direct transition from public to root can be restored
>> +  $ hg phase --public C
>> +  $ hg bundle -a bundle
>> +  5 changesets found
>> +  $ hg strip --no-backup A
>> +  $ hg unbundle -q bundle
>> +  $ rm bundle
>> +  $ hg log -G -T '{desc} {phase}\n'
>> +  o  E secret
>> +  |
>> +  o  D secret
>> +  |
>> +  o  C public
>> +  |
>> +  o  B public
>> +  |
>> +  o  A public
>> +
>> +Revisions within bundle preserve their phase even if parent changes its
>> phase
>> +  $ hg phase --draft --force B
>> +  $ hg bundle --base B -r E bundle
>> +  3 changesets found
>> +  $ hg strip --no-backup C
>> +  $ hg phase --public B
>> +  $ hg unbundle -q bundle
>> +  $ rm bundle
>> +  $ hg log -G -T '{desc} {phase}\n'
>> +  o  E secret
>> +  |
>> +  o  D secret
>> +  |
>> +  o  C draft
>> +  |
>> +  o  B public
>> +  |
>> +  o  A public
>> +
>> +Phase of ancestors of stripped node get advanced to accommodate child
>> +  $ hg bundle --base B -r E bundle
>> +  3 changesets found
>> +  $ hg strip --no-backup C
>> +  $ hg phase --force --secret B
>> +  $ hg unbundle -q bundle
>> +  $ rm bundle
>> +  $ hg log -G -T '{desc} {phase}\n'
>> +  o  E secret
>> +  |
>> +  o  D secret
>> +  |
>> +  o  C draft
>> +  |
>> +  o  B draft
>> +  |
>> +  o  A public
>> +
>> +Unbundling advances phases of changesets even if they were already in the
>> repo.
>> +To test that, create a bundle of everything in draft phase and then
>> unbundle
>> +to see that secret becomes draft, but public remains public.
>> +  $ hg phase --draft --force A
>> +  $ hg phase --draft E
>> +  $ hg bundle -a bundle
>> +  5 changesets found
>> +  $ hg phase --public A
>> +  $ hg phase --secret --force E
>> +  $ hg unbundle -q bundle
>> +  $ rm bundle
>> +  $ hg log -G -T '{desc} {phase}\n'
>> +  o  E draft
>> +  |
>> +  o  D draft
>> +  |
>> +  o  C draft
>> +  |
>> +  o  B draft
>> +  |
>> +  o  A public
>> +
>> +  $ cd ..
>
>
> --
> Pierre-Yves David