[PATCH 2 of 4 V3 part 2] ancestor: add lazy membership testing to lazyancestors
Kevin Bullock
kbullock+mercurial at ringworld.org
Tue Dec 18 10:43:56 CST 2012
On Dec 17, 2012, at 11:30 PM, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0 at fb.com>
> # Date 1355808393 28800
> # Node ID 379ba541bc8c1c1d0b014b073da1767a6f2fa17e
> # Parent 492535c0a3b45125888c7d20d402857cecaeae00
> ancestor: add lazy membership testing to lazyancestors
>
> In a linear repository with over 400,000 commits, without this patch, hg
> perfancestorset takes 0.80 seconds no matter how far behind we're looking.
> With this patch, hg perfancestorset -- X takes:
>
> Rev X Time
> -1 0.00s
> -4000 0.01s
> -20000 0.04s
> -80000 0.17s
> -200000 0.43s
> -300000 0.69s
> 0 0.88s
>
> Thus, for revisions close to tip, we're up to several orders of magnitude
> faster. At 0 we're around 10% slower.
Looks good, but some comments below. I don't see any obvious code paths that would suffer from the latter case, do you have a sense of this?
> diff -r 492535c0a3b4 -r 379ba541bc8c contrib/perf.py
> --- a/contrib/perf.py Mon Dec 17 20:42:24 2012 -0800
> +++ b/contrib/perf.py Mon Dec 17 21:26:33 2012 -0800
> @@ -82,7 +82,7 @@
> revs = repo.revs(revset)
> heads = repo.changelog.headrevs()
> def d():
> - s = set(repo.changelog.ancestors(heads))
> + s = repo.changelog.ancestors(heads)
I'm uneasy about this change being included in this patch. It should at least be referenced in the commit message, if not moved to a subsequent patch.
> for rev in revs:
> rev in s
> timer(d)
> diff -r 492535c0a3b4 -r 379ba541bc8c mercurial/ancestor.py
> --- a/mercurial/ancestor.py Mon Dec 17 20:42:24 2012 -0800
> +++ b/mercurial/ancestor.py Mon Dec 17 21:26:33 2012 -0800
> @@ -322,8 +322,8 @@
> """Create a new object generating ancestors for the given revs. Does
> not generate revs lower than stoprev.
>
> - This is computed lazily starting from revs. The object only supports
> - iteration.
> + This is computed lazily starting from revs. The object supports
> + iteration and membership.
>
> cl should be a changelog and revs should be an iterable. inclusive is
> a boolean that indicates whether revs should be included. Revs lower
> @@ -335,6 +335,19 @@
> self._stoprev = stoprev
> self._inclusive = inclusive
>
> + # Initialize data structures for __contains__.
> + # For __contains__, we use a heap rather than a deque because
> + # (a) it minimizes the number of parentrevs calls made
> + # (b) it makes the loop termination condition obvious
> + # Python's heap is a min-heap. Multiply all values by -1 to convert it
> + # into a max-heap.
> + self._containsvisit = [-rev for rev in revs]
> + heapq.heapify(self._containsvisit)
> + if inclusive:
> + self._containsseen = set(revs)
> + else:
> + self._containsseen = set()
I'd prefer if these were simply named self._seen and self._visit, with a comment that they aren't to be used for iteration.
> diff -r 492535c0a3b4 -r 379ba541bc8c tests/test-ancestor.py
> --- a/tests/test-ancestor.py Mon Dec 17 20:42:24 2012 -0800
> +++ b/tests/test-ancestor.py Mon Dec 17 21:26:33 2012 -0800
> @@ -34,6 +34,9 @@
> 13: [8]}
> pfunc = graph.get
>
> +class mockchangelog(object):
> + parentrevs = graph.get
> +
> def runmissingancestors(revs, bases):
> print "%% ancestors of %s and not of %s" % (revs, bases)
> print ancestor.missingancestors(revs, bases, pfunc)
> @@ -70,5 +73,33 @@
> runmissingancestors([10, 11, 12], [13])
> runmissingancestors([13], [10, 11, 12])
>
> +def genlazyancestors(revs, stoprev=0, inclusive=False):
> + print "%% lazy ancestor set for %s, inclusive = %s" % (revs, inclusive)
> + return ancestor.lazyancestors(mockchangelog, revs, stoprev=stoprev,
> + inclusive=inclusive)
> +
> +def isinlazyset(s, l):
> + print [n for n in l if n in s]
The name of this function is entirely opaque to me. Something more like 'printlazyancestors' would be an improvement.
pacem in terris / мир / शान्ति / سَلاَم / 平和
Kevin R. Bullock
More information about the Mercurial-devel
mailing list