[PATCH 1 of 5 V2] graphmod: add a function for topological iteration

Tue Dec 2 17:29:22 UTC 2014

# HG changeset patch
# User Pierre-Yves David <pierre-yves.david at fb.com>
# Date 1409847572 -7200
#      Thu Sep 04 18:19:32 2014 +0200
# Node ID d99276796fabf9c8a8cf762d7b0d0927c4560dd4
# Parent  2d86f4e38c089c541bd57bebb7ef47adc0658585
graphmod: add a function for topological iteration

This changesets introduce a function to perform topological (one branch after
the other) iteration over a set of changeset. This first version have a lot of
limitation but the approach should be flexible enough to allow all kind of
improvement in the future. This changeset aims at setting the first stone more
than providing a final solution.

The algorithm works does not needs to know the whole set of nodes involved
before emitting revision. So make it a good candidate for usage in place like
`hg log` or graphical tools that need a fast first result time.

diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
--- a/mercurial/graphmod.py
+++ b/mercurial/graphmod.py
@@ -20,10 +20,172 @@ Data depends on type.
 from mercurial.node import nullrev
 import util
 
 CHANGESET = 'C'
 
+def groupbranchiter(revs, parentsfunc):
+    """yield revision from heads to roots one (topo) branch after the other.
+
+    This function aims at being used by graph generator that wish to minimise
+    the amount of parallel branches and their interleaving.
+
+    Example iteration order:
+
+      o  4
+      |
+      o  1
+      |
+      | o  3
+      | |
+      | o  2
+      |/
+      o  0
+
+    Currently does not handle non contigious <revs> input.
+
+    Currently consider every changeset under a merge to be on the same branch
+    using revision number to sort them.
+
+    Could be easily extend to give priority to an initial branch."""
+    ### Quick summary of the algorithm
+    #
+    # This function is based around a "retention" principle. We keep revisions
+    # in memory until we are ready to emit a whole branch that immediately
+    # "merge" into an existing one. This reduce the number of branch "ongoing"
+    # at the same time.
+    #
+    # During iteration revs are split into two groups:
+    # A) revision already emitted
+    # B) revision in "retention". They are stored as different subgroups.
+    #
+    # for each REV, we do the follow logic:
+    #
+    #   a) if REV is a parent of (A), we will emit it. But before emitting it,
+    #   we'll "free" all the revs from subgroup in (B) that were waiting for
+    #   REV to be available. So we emit all revision of such subgroup before
+    #   emitting REV
+    #
+    #   b) else, we'll search for a subgroup in (B) awaiting for REV to be
+    #   available, if such subgroup exist, we add REV to it and the subgroup is
+    #   now awaiting for REV.parents() to be available.
+    #
+    #   c) finally if no such group existed in (B), we create a new subgroup.
+    #
+    #
+    # To bootstrap the algorithm, we emit the tipmost revision.
+
+    revs.sort(reverse=True)
+
+    # Set of parents of revision that have been yield. They can be considered
+    # unblocked as the graph generator is already aware of them so there is no
+    # need to delay the one that reference them.
+    unblocked = set()
+
+    # list of group waiting to be displayed, each group is defined by:
+    #
+    #   (revs:    lists of revs waiting to be displayed,
+    #    blocked: set of that cannot be displayed before those in 'revs')
+    #
+    # The second value ('blocked') correspond to parents of any revision in the
+    # group ('revs') that is not itself contained in the group. The main idea
+    # of this algorithm is to delay as much as possible the emission of any
+    # revision.  This means waiting for the moment we are about to display
+    # theses parents to display the revs in a group.
+    #
+    # This first implementation is smart until it meet a merge: it will emit
+    # revs as soon as any parents is about to be emitted and can grow an
+    # arbitrary number of revs in 'blocked'. In practice this mean we properly
+    # retains new branches but give up on any special ordering for ancestors of
+    # merges. The implementation can be improved to handle this better.
+    #
+    # The first subgroup is special. It correspond to all the revision that
+    # were already emitted. The 'revs' lists is expected to be empty and the
+    # 'blocked' set contains the parents revisions of already emitted revision.
+    #
+    # You could pre-seed the <parents> set of groups[0] to a specific
+    # changesets to select what the first emitted branch should be.
+    #
+    # We do not support revisions will hole yet, but adding such support would
+    # be easy. The iteration will have to be done using both input revision and
+    # parents (see cl.ancestors function + a few tweaks) but only revisions
+    # parts of the initial set should be emitted.
+    groups = [([], unblocked)]
+    for current in revs:
+        # Look for a subgroup blocked, waiting for the current revision.
+        matching = [i for i, g in enumerate(groups) if current in g[1]]
+
+        if matching:
+            # The main idea is to gather together all sets that await on the
+            # same revision.
+            #
+            # This merging is done at the time we are about to add this common
+            # awaited to the subgroup for simplicity purpose. Such merge could
+            # happen sooner when we update the "blocked" set of revision.
+            #
+            # We also always keep the oldest subgroup first. We can probably
+            # improve the behavior by having the longuest set first. That way,
+            # graph algorythms could minimise the length of parallele lines
+            # their draw. This is currently not done.
+            targetidx = matching.pop(0)
+            trevs, tparents = groups[targetidx]
+            for i in matching:
+                gr = groups[i]
+                trevs.extend(gr[0])
+                tparents |= gr[1]
+            # delete all merged subgroups (but the one we keep)
+            # (starting from the last subgroup for performance and sanity reason)
+            for i in reversed(matching):
+                del groups[i]
+        else:
+            # This is a new head. We create a new subgroup for it.
+            targetidx = len(groups)
+            groups.append(([], set([current])))
+
+        gr = groups[targetidx]
+
+        # We now adds the current nodes to this subgroups. This is done after
+        # the subgroup merging because all elements from a subgroup that relied
+        # on this rev must preceed it.
+        #
+        # we also update the <parents> set to includes the parents on the
+        # new nodes.
+        gr[0].append(current)
+        gr[1].remove(current)
+        gr[1].update([p for p in parentsfunc(current) if p > nullrev])
+
+        # Look for a subgroup to display
+        #
+        # When unblocked is empty (if clause), We are not waiting over any
+        # revision during the first iteration (if no priority was given) or if
+        # we outputed a whole disconnected sets of the graph (reached a root).
+        # In that case we arbitrarily takes the oldest known subgroup. The
+        # heuristique could probably be better.
+        #
+        # Otherwise (elif clause) this mean we have some emitted revision.  if
+        # the subgroup awaits on the same revision that the outputed ones, we
+        # can safely output it.
+        if not unblocked:
+            if len(groups) > 1:  # display other subset
+                targetidx = 1
+                gr = groups[1]
+        elif not gr[1] & unblocked:
+            gr = None
+
+        if gr is not None:
+            # update the set of awaited revisions with the one from the
+            # subgroup
+            unblocked |= gr[1]
+            # output all revisions in the subgroup
+            for r in gr[0]:
+                yield r
+            # deleted the subgroup that you just outputed.
+            # unless it is groups[0] in which case you just empty it.
+            if targetidx:
+                del groups[targetidx]
+            else:
+                gr[0][:] = []
+
 def dagwalker(repo, revs):
     """cset DAG generator yielding (id, CHANGESET, ctx, [parentids]) tuples
 
     This generator function walks through revisions (which should be ordered
     from bigger to lower). It returns a tuple for each node. The node and parent