[PATCH 2 of 3 v2] bundle: add config option to include phases

Pierre-Yves David pierre-yves.david at ens-lyon.org
Thu Jun 22 11:51:02 EDT 2017



On 06/20/2017 07:00 PM, Martin von Zweigbergk via Mercurial-devel wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz at google.com>
> # Date 1497939455 25200
> #      Mon Jun 19 23:17:35 2017 -0700
> # Node ID 04fa2d783f067765f9d8cf0d1b02fae1819c232e
> # Parent  0f77eccc602140bd4f8066a196b8ae53c0f50ef9
> bundle: add config option to include phases
> 
> This adds an experimental.bundle-phases config option to include phase
> information in bundles. As with the recently added support for
> bundling obsmarkers, the support for bundling phases is hidden behind
> the config option until we decide to make a bundle format v3 that
> includes phases (and obsmarkers and ...).

TL;DR; I've found a bug in the handling non-bundled descendant of the 
bundled changeset. The computation around this part also seems more 
heavy than they should be. With a worrisome use of 'descendant' call.
I recommend dropping this series and looking at the approach summarized 
at the end of this email.

At the risk of being a bit pedantic: "bundle format v3" should be 
"bundlespec v3" to be less confusing. There will be no change of the 
binary bundle format, just change to what we decide the include in this 
content.

(not sure if it is worth fixing)

> We could perhaps use the listkeys format for this, but that's
> considered obsolete according to Pierre-Yves. Instead, we introduce a
> new "phases" bundle part. The new part contains the phase roots that
> are internal to the set of bundled revisions. It also contains the
> phase for each root of the set of bundled revisions.
> 
> For now, phases are only included by "hg bundle", and not by
> e.g. strip and rebase.
> 
> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> --- a/mercurial/bundle2.py
> +++ b/mercurial/bundle2.py
> @@ -158,6 +158,7 @@
>       changegroup,
>       error,
>       obsolete,
> +    phases,
>       pushkey,
>       pycompat,
>       tags,
> @@ -178,6 +179,8 @@
>   _fpayloadsize = '>i'
>   _fpartparamcount = '>BB'
>   
> +_fphasesentry = '>i20s'
> +
>   preferedchunksize = 4096
>   
>   _parttypeforbidden = re.compile('[^a-zA-Z0-9_:-]')
> @@ -1387,6 +1390,13 @@
>           obsmarkers = repo.obsstore.relevantmarkers(outgoing.missing)
>           buildobsmarkerspart(bundler, obsmarkers)
>   
> +    if opts.get('phases', False):
> +        rootphases = phases.subsetphases(repo, outgoing.missing)
> +        phasedata = []
> +        for root, phase in rootphases.iteritems():
> +            phasedata.append(_pack(_fphasesentry, phase, root))
> +        bundler.newpart('phases', data=''.join(phasedata))
> +

In an "open" set of revision, phases heads is likely to provide a clear 
information and be simpler to apply. I think I've found of bug related 
to the usage of roots here, more on it at the 'phases.updatephases' level.

I recommend having a more verbose part name too (eg: 'phase-roots' or 
'phase-heads')

>   def addparttagsfnodescache(repo, bundler, outgoing):
>       # we include the tags fnode cache for the bundle changeset
>       # (as an optional parts)
> @@ -1721,6 +1731,25 @@
>                   kwargs[key] = inpart.params[key]
>           raise error.PushkeyFailed(partid=str(inpart.id), **kwargs)
>   
> + at parthandler('phases')
> +def handlephases(op, inpart):
> +    """apply phases from bundle part to repo"""
> +    phaseroots = {}
> +    entrysize = struct.calcsize(_fphasesentry)
> +    while True:
> +        entry = inpart.read(entrysize)
> +        if len(entry) < entrysize:
> +            if entry:
> +                op.ui.debug('ignoring incomplete phase entry%s\n' % entry)

Incomplete data are malformed input, why not just abort in this case? 
There are no reason a normal peer would send you malformed input.

> +            break
> +        phase, node = struct.unpack(_fphasesentry, entry)
> +        phaseroots[node] = phase
> +    addednodes = []
> +    for entry in op.records['changegroup']:
> +        addednodes.extend(entry['addednodes'])
> +    phases.updatephases(op.repo.unfiltered(), op.gettransaction(), phaseroots,
> +                        addednodes)
> +
>   @parthandler('reply:pushkey', ('return', 'in-reply-to'))
>   def handlepushkeyreply(op, inpart):
>       """retrieve the result of a pushkey request"""
> diff --git a/mercurial/commands.py b/mercurial/commands.py
> --- a/mercurial/commands.py
> +++ b/mercurial/commands.py
> @@ -1351,6 +1351,8 @@
>       contentopts = {'cg.version': cgversion}
>       if repo.ui.configbool('experimental', 'evolution.bundle-obsmarker', False):
>           contentopts['obsolescence'] = True
> +    if repo.ui.configbool('experimental', 'bundle-phases', False):
> +        contentopts['phases'] = True
>       bundle2.writenewbundle(ui, repo, 'bundle', fname, bversion, outgoing,
>                              contentopts, compression=bcompression,
>                              compopts=compopts)
> diff --git a/mercurial/phases.py b/mercurial/phases.py
> --- a/mercurial/phases.py
> +++ b/mercurial/phases.py
> @@ -430,6 +430,41 @@
>           else:
>               return False
>   
> +def subsetphases(repo, subset):
> +    """Finds the phase roots for a subset of a history
> +
> +    Returns a dict with a root as key and phase as value. The keys will
> +    be the phases root within the nodes subset, plus the roots of the subset.
> +    """
> +    phases = {}
> +    for phase in trackedphases:
> +        for root in repo._phasecache.phaseroots[phase]:
> +            if root in subset:
> +                phases[root] = phase

Overall, sending heads (instead of roots) would makes various things 
simpler.

Given the behavior of phases, processing "heads" on the destination 
would be a simple advanceboundary' call without further computation.
Computing the heads on the emitting side will also be more efficient 
since we already have will set of non-public revision pre-computed.
Right now the receiving end has to make a lot of 'descendant' call that 
are inefficient.

This server side computation also requires more information and is error 
prone (cf, bug pointed in updatephases)

> +    for ctx in repo.set('roots(%ln)', subset):
> +        phases[ctx.node()] = ctx.phase()

note: the set of root in phasecache.phaseroots is already minimized so 
you should not need to run this extra ''roots(%ln)' call here.

> +    return phases
> +
> +def updatephases(repo, tr, phaseroots, addednodes):
> +    """Updates the repo with the given phase roots (node->phase dict)"""
> +    cl = repo.changelog
> +
> +    # First make all the added revisions secret because changegroup.apply()
> +    # currently sets the phase to draft.
> +    addedroots = [cl.node(rev) for rev in repo.revs('roots(%ln)', addednodes)]
> +    retractboundary(repo, tr, secret, addedroots)

I think the above introduce a bug in case were bundle content as other 
descendant (secret being wrongfully turned to draft).. I'm not seeing 
the case tested so I'll assume it is buggy for now :-p.

Practical example:

   initial state:
   A(public) - B(draft) - C(draft) - D (secret)

   $ hg bundle --base A --rev B /tmp/babar.hg
   bundle content:
     phase-roots:
       B draft

   $ hg unbundle /tmp/babar.hg
   algorithm step:
     makes all added not secret → make B secret → make C secret
         state: A(public) - B(secret) - C(secret) - D (secret)
         (no public root, skip)
         draft root (no secret roots)
            compute('B:: - ()::') == compute('B::') = B,C,D
         state: A(public) - B(draft) - C(draft) - D (draft)

I recommend implementing this test case.

> +    # Now advance phase boundaries of all but secret phase
> +    nodesbyphase = [[] for i in allphases]
> +    for node, phase in phaseroots.iteritems():
> +        nodesbyphase[phase].append(node)
> +    publicheads = repo.revs('heads((%ln::) - (%ln::))', nodesbyphase[public],
> +                             nodesbyphase[draft] + nodesbyphase[secret])
> +    advanceboundary(repo, tr, public, [cl.node(rev) for rev in publicheads])
> +    draftheads = repo.revs('heads((%ln::) - (%ln::))', nodesbyphase[draft],
> +                             nodesbyphase[secret])
> +    advanceboundary(repo, tr, draft, [cl.node(rev) for rev in draftheads])
> +
>   def analyzeremotephases(repo, subset, roots):
>       """Compute phases heads and root in a subset of node from root dict
>   
> diff --git a/tests/test-bundle-phases.t b/tests/test-bundle-phases.t
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-bundle-phases.t
> @@ -0,0 +1,148 @@
> +  $ cat >> $HGRCPATH <<EOF
> +  > usegeneraldelta=yes

Why do we have to explicitly request experimental support here?

> +  > [experimental]
> +  > bundle-phases=yes
> +  > [extensions]
> +  > strip=
> +  > drawdag=$TESTDIR/drawdag.py
> +  > EOF
> +
> +Set up repo with linear history

I recommend introducing some branching in the test (and merge) as this 
will impact some logic.

(I can elaborate if requested)


So I would advocates for:

- head-based-part (type: 'phase-heads')
- computing the head on the emitting side (using 'immutable' branchmap + 
phase sets)
- simple receiving bundle2 part only calling 'advanceboundary'
- adding the missing test case (for the current bug)
- adding test with branching and merging.

(note: this would make patch 1 unnecessary)

> +  $ hg init linear
> +  $ cd linear
> +  $ hg debugdrawdag <<'EOF'
> +  > E
> +  > |
> +  > D
> +  > |
> +  > C
> +  > |
> +  > B
> +  > |
> +  > A
> +  > EOF
> +  $ hg phase --public A
> +  $ hg phase --force --secret D
> +  $ hg log -G -T '{desc} {phase}\n'
> +  o  E secret
> +  |
> +  o  D secret
> +  |
> +  o  C draft
> +  |
> +  o  B draft
> +  |
> +  o  A public
> +
> +Phases are restored when unbundling
> +  $ hg bundle --base B -r E bundle
> +  3 changesets found
> +  $ hg strip --no-backup C
> +  $ hg unbundle -q bundle
> +  $ rm bundle
> +  $ hg log -G -T '{desc} {phase}\n'
> +  o  E secret
> +  |
> +  o  D secret
> +  |
> +  o  C draft
> +  |
> +  o  B draft
> +  |
> +  o  A public
> +
> +Root revision's phase is preserved
> +  $ hg bundle -a bundle
> +  5 changesets found
> +  $ hg strip --no-backup A
> +  $ hg unbundle -q bundle
> +  $ rm bundle
> +  $ hg log -G -T '{desc} {phase}\n'
> +  o  E secret
> +  |
> +  o  D secret
> +  |
> +  o  C draft
> +  |
> +  o  B draft
> +  |
> +  o  A public
> +
> +Direct transition from public to root can be restored
> +  $ hg phase --public C
> +  $ hg bundle -a bundle
> +  5 changesets found
> +  $ hg strip --no-backup A
> +  $ hg unbundle -q bundle
> +  $ rm bundle
> +  $ hg log -G -T '{desc} {phase}\n'
> +  o  E secret
> +  |
> +  o  D secret
> +  |
> +  o  C public
> +  |
> +  o  B public
> +  |
> +  o  A public
> +
> +Revisions within bundle preserve their phase even if parent changes its phase
> +  $ hg phase --draft --force B
> +  $ hg bundle --base B -r E bundle
> +  3 changesets found
> +  $ hg strip --no-backup C
> +  $ hg phase --public B
> +  $ hg unbundle -q bundle
> +  $ rm bundle
> +  $ hg log -G -T '{desc} {phase}\n'
> +  o  E secret
> +  |
> +  o  D secret
> +  |
> +  o  C draft
> +  |
> +  o  B public
> +  |
> +  o  A public
> +
> +Phase of ancestors of stripped node get advanced to accommodate child
> +  $ hg bundle --base B -r E bundle
> +  3 changesets found
> +  $ hg strip --no-backup C
> +  $ hg phase --force --secret B
> +  $ hg unbundle -q bundle
> +  $ rm bundle
> +  $ hg log -G -T '{desc} {phase}\n'
> +  o  E secret
> +  |
> +  o  D secret
> +  |
> +  o  C draft
> +  |
> +  o  B draft
> +  |
> +  o  A public
> +
> +Unbundling advances phases of changesets even if they were already in the repo.
> +To test that, create a bundle of everything in draft phase and then unbundle
> +to see that secret becomes draft, but public remains public.
> +  $ hg phase --draft --force A
> +  $ hg phase --draft E
> +  $ hg bundle -a bundle
> +  5 changesets found
> +  $ hg phase --public A
> +  $ hg phase --secret --force E
> +  $ hg unbundle -q bundle
> +  $ rm bundle
> +  $ hg log -G -T '{desc} {phase}\n'
> +  o  E draft
> +  |
> +  o  D draft
> +  |
> +  o  C draft
> +  |
> +  o  B draft
> +  |
> +  o  A public
> +
> +  $ cd ..

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list