[PATCH 1 of 5 V1 REBASED] obsolete: compute successors set

Fri Dec 7 15:02:25 CST 2012

I've got nothing to add to Kevin's review for this round.

On Dec 7, 2012, at 11:19 AM, Kevin Bullock <kbullock+mercurial at ringworld.org> wrote:

> On Nov 30, 2012, at 3:12 PM, Pierre-Yves David wrote:
> 
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david at logilab.fr>
>> # Date 1352486233 -3600
>> # Node ID d58f0a5976a0e663e2afff58ccd4883eb4d9b99b
>> # Parent  a39fe76c4c65c73d8824f7e3092dbbc049235422
>> obsolete: compute successors set
>> 
>> Successors set are an important part of obsolescence. It is necessary to detect
>> and solve divergence situation. This changeset add a core function to compute
>> them, a debug command to audit them and solid test on the concept.
>> 
>> Check function docstring for details about the concept.
>> 
>> diff --git a/mercurial/commands.py b/mercurial/commands.py
>> --- a/mercurial/commands.py
>> +++ b/mercurial/commands.py
>> @@ -2457,10 +2457,62 @@ def debugsub(ui, repo, rev=None):
>>    for k, v in sorted(ctx.substate.items()):
>>        ui.write(('path %s\n') % k)
>>        ui.write((' source   %s\n') % v[0])
>>        ui.write((' revision %s\n') % v[1])
>> 
>> + at command('debugsuccessorssets',
> 
> Adding this command is begging to be a separate patch.
> 
>> +    [],
>> +    _('[REV]'))
>> +def debugsuccessorssets(ui, repo, *revs):
>> +    """show set of successors for revision
>> +
>> +    Successors set of changeset A are a consistent group of revision that
>      The successors set             is
> 
> Also, I think what you mean by "consistent" here is actually "coherent", but I'm not sure.
> 
>> +    succeed to A. Successors set contains non-obsolete changeset only.
>       succeed A.     It contains
> 
>> +
>> +    In most case a changeset A have zero (changeset pruned) or a single
>> +    successors set that contains a single successors (changeset A replacement
>> +    by A')
>> +
>> +    But splitted changeset will result with successors set containing more than
>> +    a single element. Divergent rewritting will result in multiple successor
>> +    set.
> 
> -ENOPARSE. Did you mean something like:
> 
> In most cases, a changeset will have zero or a single successor. A split changeset will have a successors set containing more than one element; divergent rewriting will result in more than one successors set.
> 
> Also, what do you mean by "changeset pruned"?
> 
>> +
>> +    result is displayed as follows::
>> +
>> +        <rev1>
>> +            <successors-1A>
>> +        <rev2>
>> +            <successors-2A>
>> +            <successors-2B1> <successors-2B1> <successors-2B1>
>                                 successors-2B2   successors-2B3   ?
> 
>> +
>> +    here rev2 have two possible successors sets. One hold three elements.
> Wait, why do we have two *possible* successors sets? Why don't we know for sure?
> 
> Ah, now I see: s/possible/divergent/. Put more clearly:
> 
> Here rev2 has two divergent successors sets, one of which splits the changeset into three successor changesets.
> 
>> +
>> +    add --debug if you want full size node id.
>> +    """
>> +    cache = {}
>> +    s = str
>> +    if ui.debug:
>> +        def s(ctx):
>> +            return ctx.hex()
>> +    for rev in scmutil.revrange(repo, revs):
>> +        ctx = repo[rev]
>> +        if ui.debug():
>> +            ui.write('%s\n'% ctx.hex())
>> +            s = hex
>> +        else:
>> +            ui.write('%s\n'% ctx)
>> +            s = short
>> +        for ss in obsolete.successorssets(repo, ctx.node(), cache):
>> +            if ss:
>> +                ui.write('    ')
>> +                ui.write(s(ss[0]))
>> +                for n in ss[1:]:
>> +                    ui.write(' ')
>> +                    ui.write(s(n))
>> +            ui.write('\n')
>> +    pass
>> +
>> @command('debugwalk', walkopts, _('[OPTION]... [FILE]...'))
>> def debugwalk(ui, repo, *pats, **opts):
>>    """show how files match on given patterns"""
>>    m = scmutil.match(repo[None], pats, opts)
>>    items = list(repo.walk(m))
>> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
>> --- a/mercurial/obsolete.py
>> +++ b/mercurial/obsolete.py
>> @@ -400,10 +400,129 @@ def allsuccessors(obsstore, nodes, ignor
>>            for suc in mark[1]:
>>                if suc not in seen:
>>                    seen.add(suc)
>>                    remaining.add(suc)
>> 
>> +nodemod = node
> 
> I think our usual practice would dictate adding a comment here saying why the alias is needed. Even better would be to `from mercurial import node as nodemod`.
> 
>> +def successorssets(repo, initialnode, cache=None):
>> +    """Return all set of successors of initial nodes
>> +
>> +    Successors set of changeset A are a consistent group of revision that
>> +    succeed to A. Successors set contains non-obsolete changeset only.
>> +
>> +    In most case a changeset A have zero (changeset pruned) or a single
>> +    successors set that contains a single successors (changeset A replacement
>> +    by A')
>> +
>> +    But splitted changeset will result with successors set containing more than
> 
> Past tense (both simple and participle) of 'split' is just 'split'.
> 
>> +    a single element. Divergent rewritting will result in multiple successor
>> +    set.
>> +
>> +    They are returned as a list of tuple containing all valid successors set.
>> +
>> +    Final successors unknown locally are considered plain prune (obsoleted
>> +    without successors).
>> +
>> +    The `cache` parameter is a dictionary that may contains precomputed
>> +    successors set. It is meant to reuse the computation of previous call to
>> +    successorssets when multiple call are made at the same time."""
> 
> ...except that we throw (most of) it away at the end of the method? Is the caller supposed to maintain their own dictionary to shove return values from successorssets() into? That seems like a lot of unnecessary data structure mangling.
> 
>> +
>> +    # prec -> markers mapping
>> +    markersfor = repo.obsstore.successors
> 
> This would be more self-explanatory if named something like 'allsuccessors' or 'succmarkers', or even just 'successors'. That would let you eliminate some of the prodigious comments in the code below.
> 
>> +
>> +    # Stack of node need to know the last successors set
>> +    toproceed = [initialnode]
>> +    # set version of toproceed for fast loop detection
>> +    stackedset = set(toproceed)
> 
> Why do we need both the stack and the set?
> 
>> +    if cache is None:
>> +        cache = {}
>> +    while toproceed:
>> +        # work on the last node of the stack
>> +        node = toproceed[-1]
>> +        if node in cache:
>> +            # We already have a value for it.
>> +            # Keep working on something else.
> 
> Unnecessary comment.
> 
>> +            stackedset.remove(toproceed.pop())
>> +        elif node not in markersfor:
>> +            # The node is not obsolete.
>> +            # This mean it is its own last successors.
>> +            if node in repo:
>> +                # We have a valid last successors.
>> +                cache[node] = [(node,)]
>> +            else:
>> +                # final obsolete version is unknown locally.
>> +                # Do not count that as a valid successors
>> +                cache[node] = []
>> +        else:
>> +            # <lss> stand for Last Successors Sets
>> +            # it contains the list of all last successors for the current node.
>> +            lss = []
>> +            for mark in markersfor[node]:
>> +                # <mlss> stand for Marker Last Successors Sets
>> +                # it contains the list of last successors set introduced by
>> +                # this marker.
>> +                mlss = [[]]
>> +                # iterate over possible multiple successors
>> +                for suc in mark[1]:
>> +                    if suc not in cache:
>> +                        # We do not know the last successors of that yet.
>> +                        if suc in stackedset:
>> +                            # Loop detected!
>> +                            #
>> +                            # we won't be able to ever compute a proper last
>> +                            # successors the naive and simple approve is to
>> +                            # consider it killed
>> +                            cache[suc] = []
>> +                        else:
>> +                            # Add the successor to the stack and break the next
>> +                            # iteration will work on this successors and the
>> +                            # algorithm will eventually process the current
>> +                            # node again.
>> +                            toproceed.append(suc)
>> +                            stackedset.add(suc)
>> +                            break
> 
> Here's about as far as I got. Most of these comments could be significantly condensed or eliminated. Perhaps it would be better to describe the general approach in a comment at the beginning of the outer loop.
> 
> pacem in terris / мир / शान्ति / ‎‫سَلاَم‬ / 平和
> Kevin R. Bullock
> 
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel