[PATCH 1 of 7] revset: introduce a _revinterval type and revsubset function
Matt Mackall
mpm at selenic.com
Mon Jun 18 17:34:10 CDT 2012
On Mon, 2012-06-04 at 15:25 -0700, Bryan O'Sullivan wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1338848634 25200
> # Node ID 4f118bc8e20a571cf283695269a625d013f19eb4
> # Parent 3b266a181a5826e3c4b19bab142d1b208bafa7cc
> revset: introduce a _revinterval type and revsubset function
>
> Very often, the set against which an intermediate revset is filtered
> is a contiguous sequence of revision numbers. A _revinterval is
> far cheaper for this purpose than a set or list, as the amount of
> data to allocate is small and constant.
>
> If a _revinterval is not appropriate, revsubset falls back to a
> set. It also avoids duplication of an existing set if possible.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -13,6 +13,77 @@ import match as matchmod
> from i18n import _
> import encoding
>
> +class _revinterval(object):
> + """Half-closed revision interval: [start, stop)"""
> + # can't use xrange instead, as it doesn't support slicing or cheap
> + # __contains__ when reversed
> +
> + def __init__(self, what=None, start=None, stop=None, reverse=False):
> + if what is None:
> + self._start, self._stop, self._reversed = start, stop, reverse
> + return
> + try:
> + if isinstance(what, xrange):
> + a, b = what[0], what[-1]
> + self._start, self._stop = sorted((a, b))
> + self._reversed = a > b
> + else:
> + # avoid localrepository.__getitem__
> + self._start = iter(what).next()
> + self._stop = self._start + len(what)
> + self._reversed = False
> + except (IndexError, StopIteration):
> + self._start, self._stop, self._reversed = 0, 0, False
> +
> + def __contains__(self, rev):
> + return rev >= self._start and rev < self._stop
> +
> + def __len__(self):
> + return self._stop - self._start
> +
> + def reverse(self):
> + self._reversed = not self._reversed
> +
> + def _iter(self, reverse):
> + if reverse:
> + return iter(xrange(self._stop - 1, self._start - 1, -1))
> + return iter(xrange(self._start, self._stop))
> +
> + def __iter__(self):
> + return self._iter(self._reversed)
> +
> + def __reversed__(self):
> + return self._iter(not self._reversed)
> +
> + def __getitem__(self, i):
> + l = len(self)
> + if isinstance(i, slice):
This method looks pretty heavy for a function we'll be calling in an
inner loop. Do we actually use slices?
If we take something like:
r = [x for x in a if x in b]
..I'm rather skeptical that this sort of implementation can beat the
native set implementation unless a is much smaller than b. That might
happen a lot, but it's worrisome.
Short of making a rangelike extension object, not sure what to do about
it. Sadly "<bignum> in xrange(<biggernum>)" is O(N).
> +def revsubset(subset):
> + """Convert to a type that supports fast membership checking"""
> + if isinstance(subset, (set, _revinterval)):
> + return subset
> + if isinstance(subset, list):
> + # can't be assumed to be contiguous
> + return set(subset)
> + return _revinterval(subset)
Hiding in here is the idea of a single hybrid list/set object. We're
often converting lists into sets, I suspect it might be better to cache
the set objects somehow. For instance:
class _revlist(list):
...
def set(self):
if not self._set:
self._set = set(self)
return self._set
> def _revancestors(repo, revs, followfirst):
> """Like revlog.ancestors(), but supports followfirst."""
> cut = followfirst and 1 or None
> diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
> --- a/mercurial/scmutil.py
> +++ b/mercurial/scmutil.py
> @@ -586,7 +586,7 @@ def revrange(repo, revs):
>
> # fall through to new-style queries if old-style fails
> m = revset.match(repo.ui, spec)
> - for r in m(repo, range(len(repo))):
> + for r in m(repo, revset.revsubset(repo)):
> if r not in seen:
> l.append(r)
> seen.update(l)
This appears to belong in patch 2.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list