[PATCH 1 of 5] graphmod: add a function for topological iteration

Augie Fackler raf at durin42.com
Tue Nov 25 15:54:20 UTC 2014


On Wed, Nov 19, 2014 at 05:35:05PM +0000, Martin von Zweigbergk wrote:
> On Tue Nov 18 2014 at 3:57:01 PM Pierre-Yves David <
> pierre-yves.david at ens-lyon.org> wrote:
>
> > # HG changeset patch
> > # User Pierre-Yves David <pierre-yves.david at fb.com>
> > # Date 1409847572 -7200
> > #      Thu Sep 04 18:19:32 2014 +0200
> > # Node ID 06767ea27aca31cd1e65ccd32c22c45c1ebab83d
> > # Parent  e63941631a3f61b3323dbcc2545689b1eb34e308
> > graphmod: add a function for topological iteration
> >
> > This changesets introduce a function to perform topological (one branch
> > after
> > the other) iteration over a set of changeset. This first version have a
> > lot of
> > limitation but the approach should be flexible enough to allow all kind of
> > improvement in the future. This changeset aims at setting the first stone
> > more
> > than providing a final solution.
> >
> > The algorithm works does not needs to know the whole set of nodes involved
> > before emitting revision. So make it a good candidate for usage in place
> > like
> > `hg log` or graphical tools that need a fast first result time.
> >
> > diff --git a/mercurial/graphmod.py b/mercurial/graphmod.py
> > --- a/mercurial/graphmod.py
> > +++ b/mercurial/graphmod.py
> > @@ -20,10 +20,160 @@ Data depends on type.
> >  from mercurial.node import nullrev
> >  import util
> >
> >  CHANGESET = 'C'
> >
> > +def topoiter(revs, parentsfunc):
> > +    """topologically iter over a set of revision, one branch at a time.
> > +
> > +    Currently does not handle non contigious <revs> input.
> > +
> > +    Currently consider every changeset under a merge to be on the same
> > branch
> > +    using revision number to sort them.
> >
>
> Any plans for how to extend the algorithm for that case? Not asking you to
> code it up, just checking that it's not too hard to extend for that case.
>
>
> > +
> > +    Could be easily extend to give priority to an initial branch.
> > +
> > +    The revision are emitted heads to roots."""
> > +    ### Quick summary of the algorithm
> > +    #
> > +    # This function is based around a "retention" principle. We keep
> > revision
> > +    # in memory until we are ready to emit a whole branch that immediately
> > +    # "merge" into an existing one. This reduce the number of branch
> > "ongoing"
> > +    # at the same time.
> > +    #
> > +    # During iteration revs are split into two groups:
> > +    # A) revision already emitted
> > +    # B) revision in "retention". They are stored as different subgroup.
> > +    #
> > +    # for each REV, we do the follow logic:
> > +    #
> > +    #   if REV is a parent of (A), we will emit it. But before emitting
> > it,
> > +    #   we'll "free" all the revs from subgroup in (B) that were waiting
> > for
> > +    #   REV to be available.
> > +    #
> > +    #   else, we'll search for a subgroup if (B) awaiting for this
> > revision to
> > +    #   be available, if such group exist, we add REV to it and the
> > subgroup is
> > +    #   now awaiting for REV.parents() to be available.
> > +    #
> > +    #   finally if no such group existed in (B), we create a new subgroup.
> > +    #
> > +    #
> > +    # To bootstrap the algorithm, we display the first revision we saw.
> >
>
> What does this line refer to? There is no explicit bootstrapping here. It
> seems like it flows from the "if no unblocked revisions exist, pick
> arbitrary group", but I'm not sure.
>
>
> > +
> > +    revs.sort(reverse=True)
> >
>
> For someone not familiar with the revset (?) implementation, this seems to
> go against the goal of incrementally producing the revisions. Is
> "sort(reverse=True)" simply not doing anything to a lazy revset that is not
> already explicitly sorted in some other order?

I'd also like to see the answer to this question, FWIW.

(I'm deferring any further review of this until I can chat with
marmoute in IRC, since this patch is enormous and it may be better for
us to review it using voice chat.)

>
>
> > +
> > +    # set of parents of revision that have been yield. They can be
> > considered
> > +    # unblocked as the graph generator is already aware of them so there
> > is no
> > +    # need to delay the one that reference them.
> > +    unblocked = set()
> > +
> > +    # list of group waiting to be displayed, each group is defined by:
> > +    #
> > +    #   (revs:    lists of revs waiting to be displayed,
> > +    #    blocked: set of that cannot be displayed before those in the
> > sets)
> > +    #
> > +    # The second value  correspond to parents of any revision in the group
> > +    # that is not itself contained in the group. The main idea of this
> > +    # algorithm is to delay as much as possible the emission of any
> > revision.
> > +    # This means waiting for the moment we are about to display theses
> > +    # parents to display the revs in a group.
> > +    #
> > +    # This first implementation is smart until it meet a merge: it will
> > +    # emit revs as soon as any parents is about to be emitted and can
> > +    # grow an arbitrary number of revs in `blocked`. In practice this mean
> > +    # we properly retains new branches but does not any special ordering
> > +    # for ancestors of merges. The implementation can be improved to
> > handle
> > +    # this better.
> > +    #
> > +    # the first group is a special group. It correspond to all the
> > revision
> > +    # that were already emitted. the <revs> lists is expected to be empty
> > +    # and the <blocked> set contains the parents revisions of already
> > emitted
> > +    # revision.
> > +    #
> > +    # You could pre-seed the <parents> set of groups[0] to a specific
> > +    # changesets to select what the first emitted branch should be.
> > +    #
> > +    # We do not support revisions will hole yet, but adding such support
> > +    # would be easy. The iteration will have to be done using both input
> > +    # revision and parents (see cl.ancestors function + a few tweaks) but
> > +    # only revisions parts of the initial set should be emitted.
> > +    groups = [([], unblocked)]
> > +    for current in revs:
> > +        # Look for a group awaiting on the current revision.
> > +        matching = [i for i, g in enumerate(groups) if current in g[1]]
> > +
> > +        if matching:
> > +            # The main idea is to gather together all set that await on
> > the
> > +            # same revision.
> > +            #
> > +            # this merging is done at the time we are about to add this
> > common
> > +            # awaited to the group for simplicity purpose. Such merge
> > could
> > +            # happen sooner when we update the "blocked" set of revision.
> > +            #
> > +            # We also always keep the oldest group first. We can
> > +            # probably improve the behavior by having the longuest set
> > +            # first. That way, graph algorythms could minimise the
> > +            # length of parallele lines their draw. This is currently
> > +            # not done.
> > +            targetidx = matching.pop(0)
> > +            trevs, tparents = groups[targetidx]
> > +            for i in matching:
> > +                gr = groups[i]
> > +                trevs.extend(gr[0])
> > +                tparents |= gr[1]
> > +            # delete all merged groups (but the one we keep)
> > +            # (starting from the last group for performance and sanity
> > reason)
> > +            for i in reversed(matching):
> > +                del groups[i]
> > +        else:
> > +            # his is a new head we create a new group for it.
> > +            targetidx = len(groups)
> > +            groups.append(([], set([current])))
> > +
> > +        gr = groups[targetidx]
> > +
> > +        # We now adds the current nodes to this groups. This is done after
> > +        # the group merging because all elements from a group that relied
> > on
> > +        # this rev must preceed it.
> > +        #
> > +        # we also update the <parents> set to includes the parents on the
> > +        # new nodes.
> > +        gr[0].append(current)
> > +        gr[1].remove(current)
> > +        gr[1].update([p for p in parentsfunc(current) if p > nullrev])
> > +
> > +        # Look for a group to display
> > +        #
> > +        # When unblocked is empty (if clause), We are not waiting over any
> > +        # revision during the first iteration (if no priority was given)
> > or if
> > +        # we outputed a whole disconnected sets of the graph (reached a
> > root).
> > +        # In that case we arbitrarily takes the oldest known group. The
> > +        # heuristique could probably be better.
> > +        #
> > +        # Otherwise (elif clause) this mean we have some emitted revision.
> > +        # if the group awaits on the same revision that the outputed ones,
> > +        # we can safely output it.
> > +        if not unblocked:
> > +            if len(groups) > 1:  # display other subset
> > +                targetidx = 1
> > +                gr = groups[1]
> > +        elif not gr[1] & unblocked:
> > +            gr = None
> > +
> > +        if gr is not None:
> > +            # update the set of awaited revisions with the one from the
> > group
> > +            unblocked |= gr[1]
> > +            # output all revisions in the group
> > +            for r in gr[0]:
> > +                yield r
> > +            # deleted the group that you just outputed.
> > +            # unless it is group[0] in which case you just empty it.
> > +            if targetidx:
> > +                del groups[targetidx]
> > +            else:
> > +                gr[0][:] = []
> > +
> >  def dagwalker(repo, revs):
> >      """cset DAG generator yielding (id, CHANGESET, ctx, [parentids])
> > tuples
> >
> >      This generator function walks through revisions (which should be
> > ordered
> >      from bigger to lower). It returns a tuple for each node. The node and
> > parent
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > http://selenic.com/mailman/listinfo/mercurial-devel
> >

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list