[PATCH 1 of 7] revset: introduce a _revinterval type and revsubset function

Bryan O'Sullivan bos at serpentine.com
Fri Jun 22 16:04:02 CDT 2012


On Mon, Jun 18, 2012 at 6:34 PM, Matt Mackall <mpm at selenic.com> wrote:

> > +    def __getitem__(self, i):
> > +        l = len(self)
> > +        if isinstance(i, slice):
>
> This method looks pretty heavy for a function we'll be calling in an
> inner loop. Do we actually use slices?
>

I would have sworn the answer was yes, as I have a strong memory of adding
the slice check due to some obscure test failing. And yet when I try it now
with that block disabled, nothing is failing. Go figure.


> If we take something like:
>
>  r = [x for x in a if x in b]
>
> ..I'm rather skeptical that this sort of implementation can beat the
> native set implementation unless a is much smaller than b. That might
> happen a lot, but it's worrisome.
>

It is very common that b is set(range(len(repo))), in which case merely
constructing the set is nastily expensive in time and memory on even a
moderately large repo.

> +def revsubset(subset):
>
> Hiding in here is the idea of a single hybrid list/set object. We're
> often converting lists into sets, I suspect it might be better to cache
> the set objects somehow.


Maybe?

Honestly, this does all really really want to be a custom extension type
written in C, using a tree of contiguous ranges. This would preserve order,
maintain good cache performance, and keep memory use as low as possible.

Would such a patch be palatable? The existing patchset makes a big
difference to performance, but I'd be fine with spending some time on doing
things the Right Way instead.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120622/ffe9a996/attachment.html>


More information about the Mercurial-devel mailing list