[PATCH] revset: improve performance of _generatorset.__contains__ (issue 4201)
Gregory Szorc
gregory.szorc at gmail.com
Tue Mar 25 18:12:01 CDT 2014
On 3/25/14, 3:38 PM, Durham Goode wrote:
> On 3/25/14 2:26 PM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:
>
>> On 3/25/14, 12:31 PM, Augie Fackler wrote:
>>> On Tue, Mar 25, 2014 at 05:42:43PM +0000, Durham Goode wrote:
>>>> On 3/24/14 8:01 PM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:
>>>>
>>>>> # HG changeset patch
>>>>> # User Gregory Szorc <gregory.szorc at gmail.com>
>>>>> # Date 1395716418 25200
>>>>> # Mon Mar 24 20:00:18 2014 -0700
>>>>> # Node ID 6f64e244c57706cfb123c32c4fadef63690eed40
>>>>> # Parent 3879ac3858ffd9bb46e19fcc3a2b31d7bb2b54c5
>>>>> revset: improve performance of _generatorset.__contains__ (issue 4201)
>>>>
>>>>
>>>> Looks good to me. I tried it in our repo using this revset:
>>>
>>> Agreed. Queued.
>>
>> FWIW, I've been running with this patch all day today and I've noticed
>> performance improvements all across the board. Rebases are obviously
>> faster. Evolve is much faster as well. Mercurial no longer feels
>> sluggish when operating on my 200,000+ changeset repo.
>
> I just found a bug in this patch. In:
>
> def __iter__(self):
> if self._iterated:
> # At least a part of the list should be cached if iteration has
> # started over the generatorset.
> for l in self._genlist:
> yield l
>
> for item in self._consumegen():
> yield item
>
> def _consumegen(self):
> self._iterated = True
>
> for item in self._gen:
> self._cache[item] = True
> self._genlist.append(item)
> yield item
>
>
> If two revsets are running _consumegen on the same revset at the same
> time, they could both be stepping through self._gen inside of their
> respective _consumegen calls. Causing them to see different, mutually
> exclusive outputs of _consumegen.
>
> I noticed it due to changes I'm making to improve root() and
> _descendants(), so I don't have a repro for @ at the moment. I'll try to
> think up a fix.
Oh, hmmm.
I think you meant to say "respective __iter__ calls" right? I don't see
the problem of having multiple consumers of _consumegen, as each
_consumegen frame will be consuming from the same underlying generator
and the results of consuming from that generator should be idempotent,
regardless of which consumer was doing it.
I see a few solutions:
1) Check the expected length of self._genlist after each item from
_consumegen is obtained in __iter__. If we didn't resume where we left
off (someone else consumed via _consumegen), yield the missing elements
first.
2) Implement the iterator protocol manually. See
http://docs.python.org/2/library/stdtypes.html#generator-types.
Essentially, "yield" does magical things. We could implement our own
iterator type and/or perform manual traversal of the underlying generator.
I think #2 - while more complicated - might be more readable. Although
it involves light Python magic.
More information about the Mercurial-devel
mailing list