[PATCH 1 of 2] templater: do not use index.partialmatch() directly to calculate shortest()

Tue Oct 25 01:00:22 EDT 2016

On Mon, Oct 24, 2016 at 5:24 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> On Sun, 23 Oct 2016 12:52:34 -0700, Martin von Zweigbergk wrote:
>> On Sat, Oct 22, 2016 at 11:35 PM, Yuya Nishihara <yuya at tcha.org> wrote:
>> > # HG changeset patch
>> > # User Yuya Nishihara <yuya at tcha.org>
>> > # Date 1477199123 -32400
>> > #      Sun Oct 23 14:05:23 2016 +0900
>> > # Branch stable
>> > # Node ID f180a39d749aeacb72936e629a372623b1f88b8c
>> > # Parent  76c57e1fe79b0980b377b4f305635dea393d6315
>> > templater: do not use index.partialmatch() directly to calculate shortest()
>> >
>> > cl.index.partialmatch() isn't a drop-in replacement for cl._partialmatch().
>> > It has no knowledge about hidden revisions, and it can't accept a node shorter
>> > than the length 4. Instead, let cl._partialmatch() use index.partialmatch()
>> > if available.
>> >
>> > The test result is sampled with --pure.
>> >
>> > diff --git a/mercurial/templater.py b/mercurial/templater.py
>> > --- a/mercurial/templater.py
>> > +++ b/mercurial/templater.py
>> > @@ -840,13 +840,8 @@ def shortest(context, mapping, args):
>> >      cl = mapping['ctx']._repo.changelog
>> >      def isvalid(test):
>> >          try:
>> > -            try:
>> > -                cl.index.partialmatch(test)
>>
>> Are there other callers of this method (index.partialmatch) or can we
>> remove it now?
>
> It's the low-level function behind cl._partialmatch(), so can't be removed.
>
>> > -            except AttributeError:
>> > -                # Pure mercurial doesn't support partialmatch on the index.
>> > -                # Fallback to the slow way.
>> > -                if cl._partialmatch(test) is None:
>> > -                    return False
>> > +            if cl._partialmatch(test) is None:
>> > +                return False
>>
>> How much slower is it? Correctness is almost always more important,
>> but it would be good to know how much slower we're making it.
>
> Here's some numbers. The performance characteristics is identical if no hidden
> revision is involved. If there are hidden revisions, _partialmatch() can fall
> back to O(len(repo)) path, which means shortest() becomes way slower if
> len(shortest(node)) > minlength.
>
> (with no hidden revision)
> % hg log -R mozilla-central -r0:20000 -T '{node|shortest}\n' --time > /dev/null
> (before) time: real 1.310 secs (user 1.290+0.000 sys 0.010+0.000)
> (after)  time: real 1.540 secs (user 1.520+0.000 sys 0.020+0.000)
>
> (with hidden revisions)
> % hg log -R hg-committed -r0:20000 -T '{node|shortest}\n' --time > /dev/null
> (before) time: real 1.530 secs (user 1.480+0.000 sys 0.040+0.000)
> (after)  time: real 43.080 secs (user 43.060+0.000 sys 0.030+0.000)
>
> If we want to choose the speed over the correctness, we can switch to use the
> unfiltered changelog, which will give the same results on both pure and CPy
> environments.

That slowdown is pretty considerable, yes, it's probably not worth it.
I don't really care if the displayed hash is the shortest among the
visible; it's more important that whatever we display can be copied
into "hg co $hash". So if both shortest() and lookup is done on the
unfiltered index, that sounds fine to me.

Btw, thanks for fixing this!