[PATCH 3 of 6] revset: do not try to sort revisions unnecessarily by addset._iterator()

Pierre-Yves David pierre-yves.david at ens-lyon.org
Fri May 15 03:32:24 CDT 2015



On 05/14/2015 07:27 AM, Yuya Nishihara wrote:
> On Wed, 13 May 2015 22:38:32 -0700, Pierre-Yves David wrote:
>> On 05/13/2015 07:39 AM, Yuya Nishihara wrote:
>>> On Tue, 12 May 2015 17:52:30 -0700, Pierre-Yves David wrote:
>>>> On 05/12/2015 03:59 PM, Yuya Nishihara wrote:
>>>>> # HG changeset patch
>>>>> # User Yuya Nishihara <yuya at tcha.org>
>>>>> # Date 1427716479 -32400
>>>>> #      Mon Mar 30 20:54:39 2015 +0900
>>>>> # Node ID cc7253d43150ef937aca205f2cb66a569378aeea
>>>>> # Parent  e487cb80cc333a4d92376daa29c08fa703c5b249
>>>>> revset: do not try to sort revisions unnecessarily by addset._iterator()
>>>>>
>>>>> This fixes addset to remove duplicated revisions from right-hand-side set
>>>>> even if one of the subset does not support fastasc/fastdesc.
>>>>>
>>>>> addset._iterator() did not do the right thing if _ascending is set, because
>>>>> addset._iterordered() expects that the given sets are sorted. This patch
>>>>> removes the broken implementation. There's no need for addset._iterator()
>>>>> to sort revisions because the result will be sorted by _trysetasclist()
>>>>> as necessary.
>>>>
>>>> I agree the current behavior is buggy. But I'm dubious about your the
>>>> way to solve it.
>>>>
>>>> The special casing of 'self._ascending' is a good correctness and
>>>> optimisation purpose.
>>>>
>>>> I see two potential issues here:
>>>>
>>>> 1) we use iterator from self._r1 and self._r2 without ensuring they are
>>>> sorted in the order we need them to be sorted (or using appropriate
>>>> directed iterator).
>>>>
>>>> 2) _iterordered is apparently not doing deduplication.
>>>>
>>>> Fixing these two should solve the bug without performance and
>>>> correctness impact?
>>>>
>>>> As far as I understand your patch,  sorting the addset would result into
>>>> a sorted output most case.
>>>
>>> _iterordered() can remove duplicates if both r1 and r2 are sorted. So we have
>>> to fix (1).
>>
>> This small patch seems to fix the duplication issue and fix the problem.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -3044,4 +3044,6 @@ class addset(abstractsmartset):
>>                gen = gen()
>>            else:
>> +            self._r1.sort(not self._ascending)
>> +            self._r2.sort(not self._ascending)
>>                iter1 = iter(self._r1)
>>                iter2 = iter(self._r2)
>
> Yes. I was afraid of mutating r1 and r2 here because input sets might be
> shared.

I cannot remember if mutating you sub-smartset is a sin or not. I've 
patchbombed a version that stay quite conservative on this approach

>>> But, when _iterator() is called with _ascending is not None (*1), we know at
>>> least one of r1 or r2 does not support fastasc/fastdesc. So we'll have to
>>> consume r1 or r2 anyway to get it sorted. I think the use of _iterordered() had
>>> little benefit in this case.
>>>
>>>    (*1) except when called from __len__(), which can be assumed to be slow
>>
>> Is _ascending is not None, the iteration is requested to be done in a
>> sorted order. I would assume than using _iterator here could give us
>> data in the wrong order, but your test seems to disagree. I'll give a
>> deeper look at why the code still behave right tomorrow.
>
> That is because the result of _iterator() is sorted by _trysetasclist().
>
>    __iter__()
>      _list  # consume the gen and try again
>        _genlist = _iterator()
>      __iter__()  # reenter
>        _trysetasclist()
>          _asclist = sorted(_genlist)
>        fastasc() or fastdesc()
>          return _asclist.__iter__ or _asclist.__reversed__

This _iterator() thing is actually a big silly. I think we could be 
lazyer (cf my pre-cited patch)

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list