[PATCH 1 of 5] graphmod: add a function for topological iteration
Augie Fackler
raf at durin42.com
Mon Dec 1 21:00:34 UTC 2014
On Tue, Nov 18, 2014 at 11:56:07PM +0000, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at fb.com>
> # Date 1409847572 -7200
> # Thu Sep 04 18:19:32 2014 +0200
> # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
> # Parent e63941631a3f61b3323dbcc2545689b1eb34e308
> graphmod: add a function for topological iteration
Pierre-Yves and I went over this verbally using a voice chat, and he's
going to send a revised version that renames the function.
In the name of reviewer and author sanity, he's going to include a
couple of follow-up patches that address some comment cleanup and
makes groups[0] not special in the algorithm.
>
> This changesets introduce a function to perform topological (one branch after
> the other) iteration over a set of changeset. This first version have a lot of
> limitation but the approach should be flexible enough to allow all kind of
> improvement in the future. This changeset aims at setting the first stone more
> than providing a final solution.
>
> The algorithm works does not needs to know the whole set of nodes involved
> before emitting revision. So make it a good candidate for usage in place like
> `hg log` or graphical tools that need a fast first result time.
>
> diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
> --- a/mercurial/graphmod.py
> +++ b/mercurial/graphmod.py
> @@ -20,10 +20,160 @@ Data depends on type.
> from mercurial.node import nullrev
> import util
>
> CHANGESET = 'C'
>
> +def topoiter(revs, parentsfunc):
> + """topologically iter over a set of revision, one branch at a time.
> +
> + Currently does not handle non contigious <revs> input.
> +
> + Currently consider every changeset under a merge to be on the same branch
> + using revision number to sort them.
> +
> + Could be easily extend to give priority to an initial branch.
> +
> + The revision are emitted heads to roots."""
> + ### Quick summary of the algorithm
> + #
> + # This function is based around a "retention" principle. We keep revision
> + # in memory until we are ready to emit a whole branch that immediately
> + # "merge" into an existing one. This reduce the number of branch "ongoing"
> + # at the same time.
> + #
> + # During iteration revs are split into two groups:
> + # A) revision already emitted
> + # B) revision in "retention". They are stored as different subgroup.
> + #
> + # for each REV, we do the follow logic:
> + #
> + # if REV is a parent of (A), we will emit it. But before emitting it,
> + # we'll "free" all the revs from subgroup in (B) that were waiting for
> + # REV to be available.
> + #
> + # else, we'll search for a subgroup if (B) awaiting for this revision to
> + # be available, if such group exist, we add REV to it and the subgroup is
> + # now awaiting for REV.parents() to be available.
> + #
> + # finally if no such group existed in (B), we create a new subgroup.
> + #
> + #
> + # To bootstrap the algorithm, we display the first revision we saw.
> +
> + revs.sort(reverse=True)
> +
> + # set of parents of revision that have been yield. They can be considered
> + # unblocked as the graph generator is already aware of them so there is no
> + # need to delay the one that reference them.
> + unblocked = set()
> +
> + # list of group waiting to be displayed, each group is defined by:
> + #
> + # (revs: lists of revs waiting to be displayed,
> + # blocked: set of that cannot be displayed before those in the sets)
> + #
> + # The second value correspond to parents of any revision in the group
> + # that is not itself contained in the group. The main idea of this
> + # algorithm is to delay as much as possible the emission of any revision.
> + # This means waiting for the moment we are about to display theses
> + # parents to display the revs in a group.
> + #
> + # This first implementation is smart until it meet a merge: it will
> + # emit revs as soon as any parents is about to be emitted and can
> + # grow an arbitrary number of revs in `blocked`. In practice this mean
> + # we properly retains new branches but does not any special ordering
> + # for ancestors of merges. The implementation can be improved to handle
> + # this better.
> + #
> + # the first group is a special group. It correspond to all the revision
> + # that were already emitted. the <revs> lists is expected to be empty
> + # and the <blocked> set contains the parents revisions of already emitted
> + # revision.
> + #
> + # You could pre-seed the <parents> set of groups[0] to a specific
> + # changesets to select what the first emitted branch should be.
> + #
> + # We do not support revisions will hole yet, but adding such support
> + # would be easy. The iteration will have to be done using both input
> + # revision and parents (see cl.ancestors function + a few tweaks) but
> + # only revisions parts of the initial set should be emitted.
> + groups = [([], unblocked)]
> + for current in revs:
> + # Look for a group awaiting on the current revision.
> + matching = [i for i, g in enumerate(groups) if current in g[1]]
> +
> + if matching:
> + # The main idea is to gather together all set that await on the
> + # same revision.
> + #
> + # this merging is done at the time we are about to add this common
> + # awaited to the group for simplicity purpose. Such merge could
> + # happen sooner when we update the "blocked" set of revision.
> + #
> + # We also always keep the oldest group first. We can
> + # probably improve the behavior by having the longuest set
> + # first. That way, graph algorythms could minimise the
> + # length of parallele lines their draw. This is currently
> + # not done.
> + targetidx = matching.pop(0)
> + trevs, tparents = groups[targetidx]
> + for i in matching:
> + gr = groups[i]
> + trevs.extend(gr[0])
> + tparents |= gr[1]
> + # delete all merged groups (but the one we keep)
> + # (starting from the last group for performance and sanity reason)
> + for i in reversed(matching):
> + del groups[i]
> + else:
> + # his is a new head we create a new group for it.
> + targetidx = len(groups)
> + groups.append(([], set([current])))
> +
> + gr = groups[targetidx]
> +
> + # We now adds the current nodes to this groups. This is done after
> + # the group merging because all elements from a group that relied on
> + # this rev must preceed it.
> + #
> + # we also update the <parents> set to includes the parents on the
> + # new nodes.
> + gr[0].append(current)
> + gr[1].remove(current)
> + gr[1].update([p for p in parentsfunc(current) if p > nullrev])
> +
> + # Look for a group to display
> + #
> + # When unblocked is empty (if clause), We are not waiting over any
> + # revision during the first iteration (if no priority was given) or if
> + # we outputed a whole disconnected sets of the graph (reached a root).
> + # In that case we arbitrarily takes the oldest known group. The
> + # heuristique could probably be better.
> + #
> + # Otherwise (elif clause) this mean we have some emitted revision.
> + # if the group awaits on the same revision that the outputed ones,
> + # we can safely output it.
> + if not unblocked:
> + if len(groups) > 1: # display other subset
> + targetidx = 1
> + gr = groups[1]
> + elif not gr[1] & unblocked:
> + gr = None
> +
> + if gr is not None:
> + # update the set of awaited revisions with the one from the group
> + unblocked |= gr[1]
> + # output all revisions in the group
> + for r in gr[0]:
> + yield r
> + # deleted the group that you just outputed.
> + # unless it is group[0] in which case you just empty it.
> + if targetidx:
> + del groups[targetidx]
> + else:
> + gr[0][:] = []
> +
> def dagwalker(repo, revs):
> """cset DAG generator yielding (id, CHANGESET, ctx, [parentids]) tuples
>
> This generator function walks through revisions (which should be ordered
> from bigger to lower). It returns a tuple for each node. The node and parent
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list