Revset-based implementation of log command

Patrick Mézard patrick at mezard.eu
Tue Apr 17 11:36:47 CDT 2012


Le 16/04/12 20:06, Matt Mackall a écrit :
> On Fri, 2012-04-13 at 22:15 +0200, Patrick Mézard wrote:
>> Le 12/04/12 22:49, Matt Mackall a écrit :
>>> On Thu, 2012-04-12 at 21:52 +0200, Patrick Mézard wrote:
>>>> [3] could be fixed by turning revset.match() into a generator. It is
>>>> not clear which revset functions can operate on a "subset" generator
>>>> rather than a list but the simple ones like user() definitely can.
>>>> Also, I have no idea how much overhead this will add compared to the
>>>> list version.
>>>
>>> Any predicate can choose to turn its args from generators into lists, if
>>> need be, so it shouldn't be all that hard architecturally to get some of
>>> this working.
>>
>> To make this valuable for log command we need to turn the revsets used
>> by graphlog._makelogrevset() into pure "filters", that is generator
>> loops over the "subset" variable. It looks difficult for two of them:
>> notset() and orset().
>>
>> notset() is used for --no-merge and --prune. One solution is to
>> compute the negated expression set not over "subset" but over all
>> revisions. This would not affect --prune much. And if we really want
>> speed for --no-merge, we can write a dedicated revset implementing it
>> directly instead of negating a revision set.
> 
> I agree that notset is tricky to turn into an iterator. The primary
> difficulty is that given an iterator i for a predicate x, returning
> things not in i without consuming all of i is hard unless x has helpful
> properties like being sorted. So if we have a revset:

[...]

> Which all suggests that we could teach the optimizer to look at the
> parse tree of not's subexpression and decide that a) it's going to be
> expensive to evaluate and b) it's independent, so we use
> independentnotset on it.

I agree with all this. At this time, I was more concerned about the two I was using to minimize the work to refactor log command. All these problems are pretty similar to what you get when dealing with an SQL optimizer.
 
> Which brings us back to not(merge()). Evaluating this in the current
> revset code means visiting the entire index and checking for second
> parents, but that's quite cheap, and it seems worth doing all the work
> up front in any case. But for not(user(x)), we want to go the
> independent route because the subexpression is expensive.
> 
>> orset() cannot be turned into a filter in the general case. Here, we
>> often have an additional property: input revisions are sorted by
>> value. In this case, we can write an orset() version computing both
>> "or" branches on "subset" (instead of subset and (subset - "first
>> branch output")) and use the revision order to merge back the
>> iterable, then preserving the order again. Again, this would be a
>> dedicated revset.
> 
> I don't see why this can't be an iterable. If we do 'x or y', we simply
> need to yield all the elements of x before we start yielding element in
> 'y - x'. And that just means we need to remember the elements of we've
> seen of x as we go. Even if either set is a non-independent revset.

Right, and in fact I did this.

But for log command I need another property I stupidly removed from my previous post: I need orset() to preserve input order. If not, stuff like "--user foo --user bar" returns two concatenated stripes of history which is a regression, unless we sort at the end and lose the iterable property again. That is what my more constrained version deals with.

--
Patrick Mézard


More information about the Mercurial-devel mailing list