[PATCH 1 of 5] graphmod: add a function for topological iteration

Martin von Zweigbergk martinvonz at google.com
Wed Nov 19 11:35:05 CST 2014


On Tue Nov 18 2014 at 3:57:01 PM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at fb.com>
> # Date 1409847572 -7200
> #      Thu Sep 04 18:19:32 2014 +0200
> # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
> # Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
> graphmod: add a function for topological iteration
>
> This changesets introduce a function to perform topological (one branch
> after
> the other) iteration over a set of changeset. This first version have a
> lot of
> limitation but the approach should be flexible enough to allow all kind of
> improvement in the future. This changeset aims at setting the first stone
> more
> than providing a final solution.
>
> The algorithm works does not needs to know the whole set of nodes involved
> before emitting revision. So make it a good candidate for usage in place
> like
> `hg log` or graphical tools that need a fast first result time.
>
> diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
> --- a/mercurial/graphmod.py
> +++ b/mercurial/graphmod.py
> @@ -20,10 +20,160 @@ Data depends on type.
>  from mercurial.node import nullrev
>  import util
>
>  CHANGESET = 'C'
>
> +def topoiter(revs, parentsfunc):
> +    """topologically iter over a set of revision, one branch at a time.
> +
> +    Currently does not handle non contigious <revs> input.
> +
> +    Currently consider every changeset under a merge to be on the same
> branch
> +    using revision number to sort them.
>

Any plans for how to extend the algorithm for that case? Not asking you to
code it up, just checking that it's not too hard to extend for that case.


> +
> +    Could be easily extend to give priority to an initial branch.
> +
> +    The revision are emitted heads to roots."""
> +    ### Quick summary of the algorithm
> +    #
> +    # This function is based around a "retention" principle. We keep
> revision
> +    # in memory until we are ready to emit a whole branch that immediately
> +    # "merge" into an existing one. This reduce the number of branch
> "ongoing"
> +    # at the same time.
> +    #
> +    # During iteration revs are split into two groups:
> +    # A) revision already emitted
> +    # B) revision in "retention". They are stored as different subgroup.
> +    #
> +    # for each REV, we do the follow logic:
> +    #
> +    #   if REV is a parent of (A), we will emit it. But before emitting
> it,
> +    #   we'll "free" all the revs from subgroup in (B) that were waiting
> for
> +    #   REV to be available.
> +    #
> +    #   else, we'll search for a subgroup if (B) awaiting for this
> revision to
> +    #   be available, if such group exist, we add REV to it and the
> subgroup is
> +    #   now awaiting for REV.parents() to be available.
> +    #
> +    #   finally if no such group existed in (B), we create a new subgroup.
> +    #
> +    #
> +    # To bootstrap the algorithm, we display the first revision we saw.
>

What does this line refer to? There is no explicit bootstrapping here. It
seems like it flows from the "if no unblocked revisions exist, pick
arbitrary group", but I'm not sure.


> +
> +    revs.sort(reverse=True)
>

For someone not familiar with the revset (?) implementation, this seems to
go against the goal of incrementally producing the revisions. Is
"sort(reverse=True)" simply not doing anything to a lazy revset that is not
already explicitly sorted in some other order?


> +
> +    # set of parents of revision that have been yield. They can be
> considered
> +    # unblocked as the graph generator is already aware of them so there
> is no
> +    # need to delay the one that reference them.
> +    unblocked = set()
> +
> +    # list of group waiting to be displayed, each group is defined by:
> +    #
> +    #   (revs:    lists of revs waiting to be displayed,
> +    #    blocked: set of that cannot be displayed before those in the
> sets)
> +    #
> +    # The second value  correspond to parents of any revision in the group
> +    # that is not itself contained in the group. The main idea of this
> +    # algorithm is to delay as much as possible the emission of any
> revision.
> +    # This means waiting for the moment we are about to display theses
> +    # parents to display the revs in a group.
> +    #
> +    # This first implementation is smart until it meet a merge: it will
> +    # emit revs as soon as any parents is about to be emitted and can
> +    # grow an arbitrary number of revs in `blocked`. In practice this mean
> +    # we properly retains new branches but does not any special ordering
> +    # for ancestors of merges. The implementation can be improved to
> handle
> +    # this better.
> +    #
> +    # the first group is a special group. It correspond to all the
> revision
> +    # that were already emitted. the <revs> lists is expected to be empty
> +    # and the <blocked> set contains the parents revisions of already
> emitted
> +    # revision.
> +    #
> +    # You could pre-seed the <parents> set of groups[0] to a specific
> +    # changesets to select what the first emitted branch should be.
> +    #
> +    # We do not support revisions will hole yet, but adding such support
> +    # would be easy. The iteration will have to be done using both input
> +    # revision and parents (see cl.ancestors function + a few tweaks) but
> +    # only revisions parts of the initial set should be emitted.
> +    groups = [([], unblocked)]
> +    for current in revs:
> +        # Look for a group awaiting on the current revision.
> +        matching = [i for i, g in enumerate(groups) if current in g[1]]
> +
> +        if matching:
> +            # The main idea is to gather together all set that await on
> the
> +            # same revision.
> +            #
> +            # this merging is done at the time we are about to add this
> common
> +            # awaited to the group for simplicity purpose. Such merge
> could
> +            # happen sooner when we update the "blocked" set of revision.
> +            #
> +            # We also always keep the oldest group first. We can
> +            # probably improve the behavior by having the longuest set
> +            # first. That way, graph algorythms could minimise the
> +            # length of parallele lines their draw. This is currently
> +            # not done.
> +            targetidx = matching.pop(0)
> +            trevs, tparents = groups[targetidx]
> +            for i in matching:
> +                gr = groups[i]
> +                trevs.extend(gr[0])
> +                tparents |= gr[1]
> +            # delete all merged groups (but the one we keep)
> +            # (starting from the last group for performance and sanity
> reason)
> +            for i in reversed(matching):
> +                del groups[i]
> +        else:
> +            # his is a new head we create a new group for it.
> +            targetidx = len(groups)
> +            groups.append(([], set([current])))
> +
> +        gr = groups[targetidx]
> +
> +        # We now adds the current nodes to this groups. This is done after
> +        # the group merging because all elements from a group that relied
> on
> +        # this rev must preceed it.
> +        #
> +        # we also update the <parents> set to includes the parents on the
> +        # new nodes.
> +        gr[0].append(current)
> +        gr[1].remove(current)
> +        gr[1].update([p for p in parentsfunc(current) if p > nullrev])
> +
> +        # Look for a group to display
> +        #
> +        # When unblocked is empty (if clause), We are not waiting over any
> +        # revision during the first iteration (if no priority was given)
> or if
> +        # we outputed a whole disconnected sets of the graph (reached a
> root).
> +        # In that case we arbitrarily takes the oldest known group. The
> +        # heuristique could probably be better.
> +        #
> +        # Otherwise (elif clause) this mean we have some emitted revision.
> +        # if the group awaits on the same revision that the outputed ones,
> +        # we can safely output it.
> +        if not unblocked:
> +            if len(groups) > 1:  # display other subset
> +                targetidx = 1
> +                gr = groups[1]
> +        elif not gr[1] & unblocked:
> +            gr = None
> +
> +        if gr is not None:
> +            # update the set of awaited revisions with the one from the
> group
> +            unblocked |= gr[1]
> +            # output all revisions in the group
> +            for r in gr[0]:
> +                yield r
> +            # deleted the group that you just outputed.
> +            # unless it is group[0] in which case you just empty it.
> +            if targetidx:
> +                del groups[targetidx]
> +            else:
> +                gr[0][:] = []
> +
>  def dagwalker(repo, revs):
>      """cset DAG generator yielding (id, CHANGESET, ctx, [parentids])
> tuples
>
>      This generator function walks through revisions (which should be
> ordered
>      from bigger to lower). It returns a tuple for each node. The node and
> parent
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20141119/7918f629/attachment.html>


More information about the Mercurial-devel mailing list