[PATCH 1 of 5 V1 REBASED] obsolete: compute successors set

Fri Dec 7 18:18:50 CST 2012

On Fri, Dec 07, 2012 at 11:19:19AM -0600, Kevin Bullock wrote:
> On Nov 30, 2012, at 3:12 PM, Pierre-Yves David wrote:
> 
> > # HG changeset patch
> > # User Pierre-Yves David <pierre-yves.david at logilab.fr>
> > # Date 1352486233 -3600
> > # Node ID d58f0a5976a0e663e2afff58ccd4883eb4d9b99b
> > # Parent  a39fe76c4c65c73d8824f7e3092dbbc049235422
> > obsolete: compute successors set
> > 
> > Successors set are an important part of obsolescence. It is necessary to detect
> > and solve divergence situation. This changeset add a core function to compute
> > them, a debug command to audit them and solid test on the concept.
> > 
> > Check function docstring for details about the concept.
> > 
> > diff --git a/mercurial/commands.py b/mercurial/commands.py
> > --- a/mercurial/commands.py
> > +++ b/mercurial/commands.py
> > @@ -2457,10 +2457,62 @@ def debugsub(ui, repo, rev=None):
> >     for k, v in sorted(ctx.substate.items()):
> >         ui.write(('path %s\n') % k)
> >         ui.write((' source   %s\n') % v[0])
> >         ui.write((' revision %s\n') % v[1])
> > 
> > + at command('debugsuccessorssets',
> 
> Adding this command is begging to be a separate patch.

This command is necessary to test the function. Patrick would advice to keep
them together. I tend to split such case but for this one, I was inclined to
keep the test with the function for readability purpose. The command are pretty
simple and the test really worst being in the same changeset.

I can split it if you insist.

> 
> > +    [],
> > +    _('[REV]'))
> > +def debugsuccessorssets(ui, repo, *revs):
> > +    """show set of successors for revision
> > +
> > +    Successors set of changeset A are a consistent group of revision that
>       The successors set             is
> 
> Also, I think what you mean by "consistent" here is actually "coherent", but I'm not sure.

I mean <french>cohérent</french>. I believe that consistent was a better
translation but I'm not a ntive speaker.

> 
> > +    succeed to A. Successors set contains non-obsolete changeset only.
>        succeed A.     It contains
> 
> > +
> > +    In most case a changeset A have zero (changeset pruned) or a single
> > +    successors set that contains a single successors (changeset A replacement
> > +    by A')
> > +
> > +    But splitted changeset will result with successors set containing more than
> > +    a single element. Divergent rewritting will result in multiple successor
> > +    set.
> 
> -ENOPARSE. Did you mean something like:
> 
>  In most cases, a changeset will have zero or a single successor. A split changeset will have a successors set containing more than one element; divergent rewriting will result in more than one successors set.

Nope. does this help?

In most case a changeset `A` have a single successors set containing a single
successors (changeset `A` replaced by `A'`). Second most common case are
changeset made obsolete with no successors at all. Such changesets have no
successors sets at all and are called `pruned`.

Less common case are:
- Splitted changesets: They have a successors set containing more than
  a single successors.
- Divergent rewritting: They have multiple successor set.

> Also, what do you mean by "changeset pruned"?

See above.

> 
> > +
> > +    result is displayed as follows::
> > +
> > +        <rev1>
> > +            <successors-1A>
> > +        <rev2>
> > +            <successors-2A>
> > +            <successors-2B1> <successors-2B1> <successors-2B1>
>                                  successors-2B2   successors-2B3   ?
> 
> > +
> > +    here rev2 have two possible successors sets. One hold three elements.
> Wait, why do we have two *possible* successors sets? Why don't we know for sure?
> 
> Ah, now I see: s/possible/divergent/. Put more clearly:
> 
>  Here rev2 has two divergent successors sets, one of which splits the changeset into three successor changesets.

Meh, divergent comes from the fact we have multiple successors sets

Here rev2 have two possible successors sets. The first hold one element, the
second hold three elements. This situation is called "divergence".

Anyway, this is a *debug* function. I expect people using it to know changeset
evolution basic.

> > +
> > +    add --debug if you want full size node id.
> > +    """
> > +    cache = {}
> > +    s = str
> > +    if ui.debug:
> > +        def s(ctx):
> > +            return ctx.hex()
> > +    for rev in scmutil.revrange(repo, revs):
> > +        ctx = repo[rev]
> > +        if ui.debug():
> > +            ui.write('%s\n'% ctx.hex())
> > +            s = hex
> > +        else:
> > +            ui.write('%s\n'% ctx)
> > +            s = short
> > +        for ss in obsolete.successorssets(repo, ctx.node(), cache):
> > +            if ss:
> > +                ui.write('    ')
> > +                ui.write(s(ss[0]))
> > +                for n in ss[1:]:
> > +                    ui.write(' ')
> > +                    ui.write(s(n))
> > +            ui.write('\n')
> > +    pass
> > +
> > @command('debugwalk', walkopts, _('[OPTION]... [FILE]...'))
> > def debugwalk(ui, repo, *pats, **opts):
> >     """show how files match on given patterns"""
> >     m = scmutil.match(repo[None], pats, opts)
> >     items = list(repo.walk(m))
> > diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> > --- a/mercurial/obsolete.py
> > +++ b/mercurial/obsolete.py
> > @@ -400,10 +400,129 @@ def allsuccessors(obsstore, nodes, ignor
> >             for suc in mark[1]:
> >                 if suc not in seen:
> >                     seen.add(suc)
> >                     remaining.add(suc)
> > 
> > +nodemod = node
> 
> I think our usual practice would dictate adding a comment here saying why the alias is needed. Even better would be to `from mercurial import node as nodemod`.

slipped in. I'll use the "node as nodemap version" in a prio renaming function.

> > +def successorssets(repo, initialnode, cache=None):
> > +    """Return all set of successors of initial nodes
> > +
> > +    Successors set of changeset A are a consistent group of revision that
> > +    succeed to A. Successors set contains non-obsolete changeset only.
> > +
> > +    In most case a changeset A have zero (changeset pruned) or a single
> > +    successors set that contains a single successors (changeset A replacement
> > +    by A')
> > +
> > +    But splitted changeset will result with successors set containing more than
> 
> Past tense (both simple and participle) of 'split' is just 'split'.
> 
> > +    a single element. Divergent rewritting will result in multiple successor
> > +    set.
> > +
> > +    They are returned as a list of tuple containing all valid successors set.
> > +
> > +    Final successors unknown locally are considered plain prune (obsoleted
> > +    without successors).
> > +
> > +    The `cache` parameter is a dictionary that may contains precomputed
> > +    successors set. It is meant to reuse the computation of previous call to
> > +    successorssets when multiple call are made at the same time."""
> 
> ...except that we throw (most of) it away at the end of the method? Is the caller supposed to maintain their own dictionary to shove return values from successorssets() into? That seems like a lot of unnecessary data structure mangling.

Ho no we don't. Look at patch 3. It is a non optional argument for caller that
do multiple calls to this function. And our main user call that on every draft
changeset on your repo.

> > +    # prec -> markers mapping
> > +    markersfor = repo.obsstore.successors
> 
> This would be more self-explanatory if named something like 'allsuccessors' or 'succmarkers', or even just 'successors'. That would let you eliminate some of the prodigious comments in the code below.

`allsuccessors` and `successors` are wrong name.

succmarkers would be a good alternative.

> > +    # Stack of node need to know the last successors set
> > +    toproceed = [initialnode]
> > +    # set version of toproceed for fast loop detection
> > +    stackedset = set(toproceed)
> 
> Why do we need both the stack and the set?

because we need fast contains check -> set
because we need FILO ordering -> list

> > +    if cache is None:
> > +        cache = {}
> > +    while toproceed:
> > +        # work on the last node of the stack
> > +        node = toproceed[-1]
> > +        if node in cache:
> > +            # We already have a value for it.
> > +            # Keep working on something else.
> 
> Unnecessary comment.
> 
> > +            stackedset.remove(toproceed.pop())
> > +        elif node not in markersfor:
> > +            # The node is not obsolete.
> > +            # This mean it is its own last successors.
> > +            if node in repo:
> > +                # We have a valid last successors.
> > +                cache[node] = [(node,)]
> > +            else:
> > +                # final obsolete version is unknown locally.
> > +                # Do not count that as a valid successors
> > +                cache[node] = []
> > +        else:
> > +            # <lss> stand for Last Successors Sets
> > +            # it contains the list of all last successors for the current node.
> > +            lss = []
> > +            for mark in markersfor[node]:
> > +                # <mlss> stand for Marker Last Successors Sets
> > +                # it contains the list of last successors set introduced by
> > +                # this marker.
> > +                mlss = [[]]
> > +                # iterate over possible multiple successors
> > +                for suc in mark[1]:
> > +                    if suc not in cache:
> > +                        # We do not know the last successors of that yet.
> > +                        if suc in stackedset:
> > +                            # Loop detected!
> > +                            #
> > +                            # we won't be able to ever compute a proper last
> > +                            # successors the naive and simple approve is to
> > +                            # consider it killed
> > +                            cache[suc] = []
> > +                        else:
> > +                            # Add the successor to the stack and break the next
> > +                            # iteration will work on this successors and the
> > +                            # algorithm will eventually process the current
> > +                            # node again.
> > +                            toproceed.append(suc)
> > +                            stackedset.add(suc)
> > +                            break
> 
> Here's about as far as I got. Most of these comments could be significantly condensed or eliminated. Perhaps it would be better to describe the general approach in a comment at the beginning of the outer loop.

Those comment comes from review and explanation of this function to various
colleague. The function is not trivial and I'm -1 for removing those comment
here.

-- 
Pierre-Yves David