[PATCH 1 of 2] ancestor: a new algorithm for ancestors of a not in b that is faster when a and b are close
Pierre-Yves David
pierre-yves.david at logilab.fr
Wed Nov 21 09:47:45 CST 2012
On Tue, Nov 20, 2012 at 07:15:26PM -0800, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0 at fb.com>
> # Date 1353467541 28800
> # Node ID 25bdb24973e3b43d84b6d4c314f035f303ada192
> # Parent 536f5df951bece49b760392fd543e8082489b7d0
> ancestor: a new algorithm for ancestors of a not in b that is faster when a and b are close
>
> One of the major reasons rebase is slow in large repositories is the computation
> of the detach set: the set of ancestors of the changesets to rebase not in the
> destination parent. This is currently done via a revset that does two walks all
> the way to the root of the DAG. Instead of doing that, to find ancestors of a
> not in b we walk up the tree in reverse revision number order, maintaining sets
> of nodes visited from a, b or both.
>
> For the common case where the nodes are close both topologically and in
> revision number (relative to repository size), this has been found to speed
> up rebase by around 15-20%. When the nodes are farther apart and the DAG is
> highly branching, it is harder to say which would win.
The function seems ok but what it does sound very similar to what discovery does. But did you had a look to the discovery code ? I'm not sure we can mutualise anything since discovery is optimized toward roundtrip. But still…
>
> Here's how long computing the detach set takes in a linear repository with over
> 400000 changesets, rebasing near tip:
>
> Rebasing across 4 changesets
> Revset method: 2.2s
> New algorithm: 0.00015s
>
> Rebasing across 250 changesets
> Revset method: 2.2s
> New algorithm: 0.00069s
>
> Rebasing across 10000 changesets
> Revset method: 2.4s
> New algorithm: 0.019s
>
> diff -r 536f5df951be -r 25bdb24973e3 mercurial/ancestor.py
> --- a/mercurial/ancestor.py Tue Nov 13 19:24:45 2012 -0800
> +++ b/mercurial/ancestor.py Tue Nov 20 19:12:21 2012 -0800
> @@ -130,6 +130,122 @@
> return gca
> return deepest(gca)
>
> +def ancestorsofanotb(a, b, pfunc):
name is terrible. Something like "missing" of "extra" would be better.
you should improve argument names, that will help to clarify the function
purpose without a too complicated name.
> + """
> + Return all the ancestors of a that are not ancestors of b, possibly
> + including a itself. Equivalent to the revset (::a - ::b).
I usually start docstring with a short help:
"""short help
long help
on multiple
lines
"""
> + pfunc must return a list of parent vertices for a given vertex.
I love the way this pfunc stuff allows you to easily doctest the function.
> +
> + The graph is a dict of child->parent adjacency lists
> + >>> graph = {0: [-1], 1: [0], 2: [1], 3: [1], 4: [2], 5: [4], 6: [4],
> + ... 7: [4], 8: [-1], 9: [6, 7], 10: [5], 11: [3, 7], 12: [9],
> + ... 13: [8]}
You should draw a visual representation of this graph. So people can understand
the doctests below.
> + >>> pfunc = graph.get
> +
> + Trivial case: a == b
> + >>> ancestorsofanotb(0, 0, pfunc)
> + []
> + >>> ancestorsofanotb(4, 4, pfunc)
> + []
> +
> + With nullrev
> + >>> ancestorsofanotb(-1, 12, pfunc)
> + []
> + >>> ancestorsofanotb(12, -1, pfunc)
> + [12, 9, 7, 6, 4, 2, 1, 0]
> +
> + 9 is a parent of 12
> + >>> ancestorsofanotb(12, 9, pfunc)
> + [12]
> + >>> ancestorsofanotb(9, 12, pfunc)
> + []
> +
> + 7 is an ancestor of 12. 6 is an ancestor of 12 but not of 7
> + >>> ancestorsofanotb(12, 7, pfunc)
> + [12, 9, 6]
> + >>> ancestorsofanotb(7, 12, pfunc)
> + []
> +
> + More complex cases
> + >>> ancestorsofanotb(10, 11, pfunc)
> + [10, 5]
> + >>> ancestorsofanotb(10, 12, pfunc)
> + [10, 5]
> + >>> ancestorsofanotb(10, 13, pfunc)
> + [10, 5, 4, 2, 1, 0]
> + >>> ancestorsofanotb(11, 10, pfunc)
> + [11, 7, 3]
> + >>> ancestorsofanotb(11, 12, pfunc)
> + [11, 3]
> + >>> ancestorsofanotb(11, 13, pfunc)
> + [11, 7, 4, 3, 2, 1, 0]
> + >>> ancestorsofanotb(12, 10, pfunc)
> + [12, 9, 7, 6]
> + >>> ancestorsofanotb(12, 11, pfunc)
> + [12, 9, 6]
> + >>> ancestorsofanotb(12, 13, pfunc)
> + [12, 9, 7, 6, 4, 2, 1, 0]
> + >>> ancestorsofanotb(13, 10, pfunc)
> + [13, 8]
> + >>> ancestorsofanotb(13, 11, pfunc)
> + [13, 8]
> + >>> ancestorsofanotb(13, 12, pfunc)
> + [13, 8]
> + """
> + if a == b:
> + return []
> +
> + avisit = set([a])
> + bvisit = set([b])
> + both = set()
> + next = max(a, b)
> + anotb = []
> + while next > nullrev and (avisit or bvisit):
> + curr = next
> + next -= 1
1) seems like a for loop with an extra break clause.
2) I think you could stop with avisit is empty.
Once it's empty you won't find additional ancestors for a.
> +
> + if curr in both:
> + both.remove(curr)
> + # curr's parents might have made it into avisit or bvisit through
> + # another path
> + for p in pfunc(curr):
> + avisit.discard(p)
> + bvisit.discard(p)
> + both.add(p)
> + continue
> +
> + # curr will never be in both avisit and bvisit, since if it were it'd
> + # have been pushed to both
> + if curr in avisit:
> + anotb.append(curr)
Does this mean that "a" can't be and ancestors of "b"?
you should HIGHLIGHT that in the docstring!
> + thisvisit = avisit
> + othervisit = bvisit
> + elif curr in bvisit:
> + thisvisit = bvisit
> + othervisit = avisit
> + else:
> + # not an ancestor of a or b: ignore
> + continue
> +
> + thisvisit.remove(curr)
> + for p in pfunc(curr):
> + if p == nullrev:
> + continue
> + if p in othervisit or p in both:
> + # p is implicitly in thisvisit. This means p is or should be
> + # in both
> + avisit.discard(p)
> + bvisit.discard(p)
> + both.add(p)
> + continue
> +
> + # visit later
> + thisvisit.add(p)
Why not just use
for p in pfunc(curr):
if p == nullrev:
...
elif p in othervisit or p in both:
...
else:
...
instead of using "continue" at the end of all your if clause ?
The algorithm as a whole looks sensible.
--
Pierre-Yves David
http://www.logilab.fr/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20121121/fd1ca650/attachment.pgp>
More information about the Mercurial-devel
mailing list