[PATCH] revset: fix generatorset race condition

Durham Goode durham at fb.com
Tue Mar 25 19:27:44 CDT 2014


On 3/25/14 5:24 PM, "Gregory Szorc" <gregory.szorc at gmail.com> wrote:

>On 3/25/14, 5:13 PM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham at fb.com>
>> # Date 1395789007 25200
>> #      Tue Mar 25 16:10:07 2014 -0700
>> # Node ID 719d5d962b0c8630c153453a48bfaeccbe9de2f7
>> # Parent  9a09a625bc93ab69248613abcbea041b785f1a0e
>> revset: fix generatorset race condition
>>
>> If two things were iterating over a generatorset at the same time, they
>>could
>> miss out on the things the other was generating, resulting in incomplete
>> results. This fixes it by making it possible for two things to iterate
>>at once,
>> by always checking the _genlist at the beginning of each iteration.
>>
>> I was only able to repro it with pending changes from my other commits,
>>but they
>> aren't ready yet. So I'm unable to add a test for now.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -2624,10 +2624,8 @@
>>           gen: a generator producing the values for the generatorset.
>>           """
>>           self._gen = gen
>> -        self._iter = iter(gen)
>>           self._cache = {}
>>           self._genlist = baseset([])
>> -        self._iterated = False
>>           self._finished = False
>>
>>       def __contains__(self, x):
>> @@ -2639,28 +2637,29 @@
>>               if l == x:
>>                   return True
>>
>> -        self._finished = True
>>           self._cache[x] = False
>>           return False
>>
>>       def __iter__(self):
>> -        if self._iterated:
>> -            # At least a part of the list should be cached if
>>iteration has
>> -            # started over the generatorset.
>> -            for l in self._genlist:
>> -                yield l
>> -
>> -        for item in self._consumegen():
>> -            yield item
>> +        if self._finished:
>> +            for x in self._genlist:
>> +                yield x
>
>Shouldn't we return here? Otherwise, don't we iterate over self._genlist
>twice if self._finished == True?
>
>Why do we even need this block? Won't the while loop below handle it?

You are correct, we need a return.  I had this block because doing all
that extra stuff below might have real cost when run across hundreds of
thousands of revs.

>
>> +
>> +        i = 0
>> +        genlist = self._genlist
>> +        consume = self._consumegen()
>> +        while True:
>> +            if i < len(genlist):
>> +                yield genlist[i]
>> +            else:
>> +                yield consume.next()
>> +            i += 1
>>
>>       def _consumegen(self):
>> -        self._iterated = True
>> -
>>           for item in self._gen:
>>               self._cache[item] = True
>>               self._genlist.append(item)
>>               yield item
>> -
>>           self._finished = True
>>
>>       def set(self):
>>
>
>




More information about the Mercurial-devel mailing list