[PATCH 1 of 5 V1 REBASED] obsolete: compute successors set

Sat Dec 8 20:36:33 CST 2012

On 7 Dec 2012, at 6:18 PM, Pierre-Yves David wrote:
> On Fri, Dec 07, 2012 at 11:19:19AM -0600, Kevin Bullock wrote:
>> On Nov 30, 2012, at 3:12 PM, Pierre-Yves David wrote:
>> 
>>> # HG changeset patch
>>> # User Pierre-Yves David <pierre-yves.david at logilab.fr>
>>> # Date 1352486233 -3600
>>> # Node ID d58f0a5976a0e663e2afff58ccd4883eb4d9b99b
>>> # Parent  a39fe76c4c65c73d8824f7e3092dbbc049235422
>>> obsolete: compute successors set
>>> 
>>> Successors set are an important part of obsolescence. It is necessary to detect
>>> and solve divergence situation. This changeset add a core function to compute
>>> them, a debug command to audit them and solid test on the concept.
>>> 
>>> Check function docstring for details about the concept.
>>> 
>>> diff --git a/mercurial/commands.py b/mercurial/commands.py
>>> --- a/mercurial/commands.py
>>> +++ b/mercurial/commands.py
>>> @@ -2457,10 +2457,62 @@ def debugsub(ui, repo, rev=None):
>>>    for k, v in sorted(ctx.substate.items()):
>>>        ui.write(('path %s\n') % k)
>>>        ui.write((' source   %s\n') % v[0])
>>>        ui.write((' revision %s\n') % v[1])
>>> 
>>> + at command('debugsuccessorssets',
>> 
>> Adding this command is begging to be a separate patch.
> 
> This command is necessary to test the function. Patrick would advice to keep
> them together. I tend to split such case but for this one, I was inclined to
> keep the test with the function for readability purpose. The command are pretty
> simple and the test really worst being in the same changeset.
> 
> I can split it if you insist.

I don't feel that strongly about it, but on closer inspection, the function is written rather confusedly (stray 'pass' at the end of the body, repeated 'if ui.debug' checks and corresponding reassignments of s).

>>> +    [],
>>> +    _('[REV]'))
>>> +def debugsuccessorssets(ui, repo, *revs):
>>> +    """show set of successors for revision
>>> +
>>> +    Successors set of changeset A are a consistent group of revision that
>>      The successors set             is
>> 
>> Also, I think what you mean by "consistent" here is actually "coherent", but I'm not sure.
> 
> I mean <french>cohérent</french>. I believe that consistent was a better
> translation but I'm not a ntive speaker.

"consistent" implies some external entity to which the changesets are somehow related, whereas "coherent" indicates that they're part of a unified whole. I think the latter is better, if I understand the context correctly (which I'm still not sure of).

>>> +    In most case a changeset A have zero (changeset pruned) or a single
>>> +    successors set that contains a single successors (changeset A replacement
>>> +    by A')
>>> +
>>> +    But splitted changeset will result with successors set containing more than
>>> +    a single element. Divergent rewritting will result in multiple successor
>>> +    set.
>> 
>> -ENOPARSE. Did you mean something like:
>> 
>> In most cases, a changeset will have zero or a single successor. A split changeset will have a successors set containing more than one element; divergent rewriting will result in more than one successors set.
> 
> Nope. does this help?
> 
> In most case a changeset `A` have a single successors set containing a single
> successors (changeset `A` replaced by `A'`). Second most common case are
> changeset made obsolete with no successors at all. Such changesets have no
> successors sets at all and are called `pruned`.
> 
> Less common case are:
> - Splitted changesets: They have a successors set containing more than
>  a single successors.
> - Divergent rewritting: They have multiple successor set.
> 
>> Also, what do you mean by "changeset pruned"?
> 
> See above.

Good, that all makes sense now (modulo some remaining grammatical issues).

>>> +
>>> +    result is displayed as follows::
>>> +
>>> +        <rev1>
>>> +            <successors-1A>
>>> +        <rev2>
>>> +            <successors-2A>
>>> +            <successors-2B1> <successors-2B1> <successors-2B1>
>>                                 successors-2B2   successors-2B3   ?
>> 
>>> +
>>> +    here rev2 have two possible successors sets. One hold three elements.
>> Wait, why do we have two *possible* successors sets? Why don't we know for sure?
>> 
>> Ah, now I see: s/possible/divergent/. Put more clearly:
>> 
>> Here rev2 has two divergent successors sets, one of which splits the changeset into three successor changesets.
> 
> Meh, divergent comes from the fact we have multiple successors sets
> 
> Here rev2 have two possible successors sets. The first hold one element, the
> second hold three elements. This situation is called "divergence".

IIUC the difference in the number of elements in the two sets is irrelevant to them being divergent. So there are actually two separate situations embodied in this example, a divergence and a split changeset. They should both be delineated clearly.

> Anyway, this is a *debug* function. I expect people using it to know changeset
> evolution basic.

Yes, but the concepts involved are sufficiently complex that we really need to keep things clear and to the point, even in a debug function.

>>> +    The `cache` parameter is a dictionary that may contains precomputed
>>> +    successors set. It is meant to reuse the computation of previous call to
>>> +    successorssets when multiple call are made at the same time."""
>> 
>> ...except that we throw (most of) it away at the end of the method? Is the caller supposed to maintain their own dictionary to shove return values from successorssets() into? That seems like a lot of unnecessary data structure mangling.
> 
> Ho no we don't. Look at patch 3. It is a non optional argument for caller that
> do multiple calls to this function. And our main user call that on every draft
> changeset on your repo.

But you don't return the whole dict, only the entry for the starting node...

>>> +    # Stack of node need to know the last successors set
>>> +    toproceed = [initialnode]
>>> +    # set version of toproceed for fast loop detection
>>> +    stackedset = set(toproceed)
>> 
>> Why do we need both the stack and the set?
> 
> because we need fast contains check -> set
> because we need FILO ordering -> list

It wasn't clear to me whether the set actually gets updated in concert with the stack.

>>> +    if cache is None:
>>> +        cache = {}
>>> +    while toproceed:
>>> +        # work on the last node of the stack
>>> +        node = toproceed[-1]
>>> +        if node in cache:
>>> +            # We already have a value for it.
>>> +            # Keep working on something else.
>> 
>> Unnecessary comment.
>> 
>>> +            stackedset.remove(toproceed.pop())
>>> +        elif node not in markersfor:
>>> +            # The node is not obsolete.
>>> +            # This mean it is its own last successors.
>>> +            if node in repo:
>>> +                # We have a valid last successors.
>>> +                cache[node] = [(node,)]
>>> +            else:
>>> +                # final obsolete version is unknown locally.
>>> +                # Do not count that as a valid successors
>>> +                cache[node] = []
>>> +        else:
>>> +            # <lss> stand for Last Successors Sets
>>> +            # it contains the list of all last successors for the current node.
>>> +            lss = []
>>> +            for mark in markersfor[node]:
>>> +                # <mlss> stand for Marker Last Successors Sets
>>> +                # it contains the list of last successors set introduced by
>>> +                # this marker.
>>> +                mlss = [[]]
>>> +                # iterate over possible multiple successors
>>> +                for suc in mark[1]:
>>> +                    if suc not in cache:
>>> +                        # We do not know the last successors of that yet.
>>> +                        if suc in stackedset:
>>> +                            # Loop detected!
>>> +                            #
>>> +                            # we won't be able to ever compute a proper last
>>> +                            # successors the naive and simple approve is to
>>> +                            # consider it killed
>>> +                            cache[suc] = []
>>> +                        else:
>>> +                            # Add the successor to the stack and break the next
>>> +                            # iteration will work on this successors and the
>>> +                            # algorithm will eventually process the current
>>> +                            # node again.
>>> +                            toproceed.append(suc)
>>> +                            stackedset.add(suc)
>>> +                            break
>> 
>> Here's about as far as I got. Most of these comments could be significantly condensed or eliminated. Perhaps it would be better to describe the general approach in a comment at the beginning of the outer loop.
> 
> Those comment comes from review and explanation of this function to various
> colleague. The function is not trivial and I'm -1 for removing those comment
> here.

I wasn't suggesting removing all the comments, but when you have an average of 3 comments for every branch or loop, the actual control flow gets hard to follow. Better variable names and more up-front explanation would help that.

pacem in terris / мир / शान्ति / ‎‫سَلاَم‬ / 平和
Kevin R. Bullock