[PATCH 3 of 6] revset: do not try to sort revisions unnecessarily by addset._iterator()

Wed May 13 09:39:01 CDT 2015

On Tue, 12 May 2015 17:52:30 -0700, Pierre-Yves David wrote:
> On 05/12/2015 03:59 PM, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya at tcha.org>
> > # Date 1427716479 -32400
> > #      Mon Mar 30 20:54:39 2015 +0900
> > # Node ID cc7253d43150ef937aca205f2cb66a569378aeea
> > # Parent  e487cb80cc333a4d92376daa29c08fa703c5b249
> > revset: do not try to sort revisions unnecessarily by addset._iterator()
> >
> > This fixes addset to remove duplicated revisions from right-hand-side set
> > even if one of the subset does not support fastasc/fastdesc.
> >
> > addset._iterator() did not do the right thing if _ascending is set, because
> > addset._iterordered() expects that the given sets are sorted. This patch
> > removes the broken implementation. There's no need for addset._iterator()
> > to sort revisions because the result will be sorted by _trysetasclist()
> > as necessary.
> 
> I agree the current behavior is buggy. But I'm dubious about your the 
> way to solve it.
> 
> The special casing of 'self._ascending' is a good correctness and 
> optimisation purpose.
> 
> I see two potential issues here:
> 
> 1) we use iterator from self._r1 and self._r2 without ensuring they are 
> sorted in the order we need them to be sorted (or using appropriate 
> directed iterator).
> 
> 2) _iterordered is apparently not doing deduplication.
> 
> Fixing these two should solve the bug without performance and 
> correctness impact?
> 
> As far as I understand your patch,  sorting the addset would result into 
> a sorted output most case.

_iterordered() can remove duplicates if both r1 and r2 are sorted. So we have
to fix (1).

But, when _iterator() is called with _ascending is not None (*1), we know at
least one of r1 or r2 does not support fastasc/fastdesc. So we'll have to
consume r1 or r2 anyway to get it sorted. I think the use of _iterordered() had
little benefit in this case.

 (*1) except when called from __len__(), which can be assumed to be slow

> I also would like you to check performance impact when touching revset 
> code. This means looking at the output from 
> 'contrib/revsetbenchmarks.py' with at least content of 
> 'contrib/revsetbenchmarks.t' (if the help of the script is too bad, 
> point it out and we'll improve that).

Hmm, I got notable difference by "roots((0:tip)::)". This seems to be caused
by the additional for-loop introduced by "revset: make addset class accepts
more than two subsets". Perhaps, I'll have to optimize addset.__contains__ and
_iterordered() for two subsets.

0) Revision: commit: no longer allow empty commit with the 'force' argument (API)
1) Revision: revset: do not nest addset by "addset + other" (issue4565)

revset #25: roots((0:tip)::)
0) wall 0.124226 comb 0.120000 user 0.120000 sys 0.000000 (best of 78)
1) wall 0.192682 comb 0.190000 user 0.190000 sys 0.000000 (best of 50)

Regards,