[PATCH 1 of 7] revset: introduce a _revinterval type and revsubset function

Matt Mackall mpm at selenic.com
Mon Jun 18 17:34:10 CDT 2012


On Mon, 2012-06-04 at 15:25 -0700, Bryan O'Sullivan wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano at fb.com>
> # Date 1338848634 25200
> # Node ID 4f118bc8e20a571cf283695269a625d013f19eb4
> # Parent  3b266a181a5826e3c4b19bab142d1b208bafa7cc
> revset: introduce a _revinterval type and revsubset function
> 
> Very often, the set against which an intermediate revset is filtered
> is a contiguous sequence of revision numbers.  A _revinterval is
> far cheaper for this purpose than a set or list, as the amount of
> data to allocate is small and constant.
> 
> If a _revinterval is not appropriate, revsubset falls back to a
> set.  It also avoids duplication of an existing set if possible.
> 
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -13,6 +13,77 @@ import match as matchmod
>  from i18n import _
>  import encoding
>  
> +class _revinterval(object):
> +    """Half-closed revision interval: [start, stop)"""
> +    # can't use xrange instead, as it doesn't support slicing or cheap
> +    # __contains__ when reversed
> +
> +    def __init__(self, what=None, start=None, stop=None, reverse=False):
> +        if what is None:
> +            self._start, self._stop, self._reversed = start, stop, reverse
> +            return
> +        try:
> +            if isinstance(what, xrange):
> +                a, b = what[0], what[-1]
> +                self._start, self._stop = sorted((a, b))
> +                self._reversed = a > b
> +            else:
> +                # avoid localrepository.__getitem__
> +                self._start = iter(what).next()
> +                self._stop = self._start + len(what)
> +                self._reversed = False
> +        except (IndexError, StopIteration):
> +            self._start, self._stop, self._reversed = 0, 0, False
> +
> +    def __contains__(self, rev):
> +        return rev >= self._start and rev < self._stop
> +
> +    def __len__(self):
> +        return self._stop - self._start
> +
> +    def reverse(self):
> +        self._reversed = not self._reversed
> +
> +    def _iter(self, reverse):
> +        if reverse:
> +            return iter(xrange(self._stop - 1, self._start - 1, -1))
> +        return iter(xrange(self._start, self._stop))
> +
> +    def __iter__(self):
> +        return self._iter(self._reversed)
> +
> +    def __reversed__(self):
> +        return self._iter(not self._reversed)
> +
> +    def __getitem__(self, i):
> +        l = len(self)
> +        if isinstance(i, slice):

This method looks pretty heavy for a function we'll be calling in an
inner loop. Do we actually use slices?

If we take something like:

 r = [x for x in a if x in b]

..I'm rather skeptical that this sort of implementation can beat the
native set implementation unless a is much smaller than b. That might
happen a lot, but it's worrisome.

Short of making a rangelike extension object, not sure what to do about
it. Sadly "<bignum> in xrange(<biggernum>)" is O(N).

> +def revsubset(subset):
> +    """Convert to a type that supports fast membership checking"""
> +    if isinstance(subset, (set, _revinterval)):
> +        return subset
> +    if isinstance(subset, list):
> +        # can't be assumed to be contiguous
> +        return set(subset)
> +    return _revinterval(subset)

Hiding in here is the idea of a single hybrid list/set object. We're
often converting lists into sets, I suspect it might be better to cache
the set objects somehow. For instance:

class _revlist(list):
   ...
   def set(self):
     if not self._set:
       self._set = set(self)
     return self._set

>  def _revancestors(repo, revs, followfirst):
>      """Like revlog.ancestors(), but supports followfirst."""
>      cut = followfirst and 1 or None


> diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
> --- a/mercurial/scmutil.py
> +++ b/mercurial/scmutil.py
> @@ -586,7 +586,7 @@ def revrange(repo, revs):
>  
>          # fall through to new-style queries if old-style fails
>          m = revset.match(repo.ui, spec)
> -        for r in m(repo, range(len(repo))):
> +        for r in m(repo, revset.revsubset(repo)):
>              if r not in seen:
>                  l.append(r)
>          seen.update(l)

This appears to belong in patch 2.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list