[PATCH 1 of 2] patchcopies: give up any optimization based on `introrev`

Sun Oct 13 17:02:25 EDT 2019

On 10/13/19 12:08 PM, Yuya Nishihara wrote:
> On Fri, 11 Oct 2019 20:00:58 +0200, Pierre-Yves David wrote:
>> On 10/11/19 2:02 PM, Yuya Nishihara wrote:
>>> On Fri, 11 Oct 2019 01:04:12 +0200, Pierre-Yves David wrote:
>>>> # HG changeset patch
>>>> # User Pierre-Yves David <pierre-yves.david at octobus.net>
>>>> # Date 1570672173 -7200
>>>> #      Thu Oct 10 03:49:33 2019 +0200
>>>> # Node ID 2477ba483c04067900d1e9f6523b03df68a4d545
>>>> # Parent  8ff1ecfadcd110849c47c77e31c92809eea466ab
>>>> # EXP-Topic patchcopies-regression
>>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 2477ba483c04
>>>> patchcopies: give up any optimization based on `introrev`
>>>>
>>>> Between 8a0136f69027 and d98fb3f42f33, we sped up the search for the
>>>> introduction revision during path copies. However, further checking show that
>>>> finding the introduction revision is still expensive and that we are better off
>>>> without it. So we simply drop it and only rely on the linkrev optimisation.
>>>>
>>>> I ran `perfpathcopies` on 6989 pair of revision in the pypy
>>>> repository (`hg perfhelper-pathcopies`. The result is massively in favor of
>>>> dropping this condition. The result of the copy tracing are unchanged.
>>>>
>>>> Attempt to use a smaller changes preserving linkrev usage were unsuccessful, it
>>>> can return wrong result. The following changesets broke test-mv-cp-st-diff.t
>>>>
>>>>       -        if not f.isintroducedafter(limit):
>>>>       +        if limit >= 0 and f.linkrev() < limit:
>>>>                    return None
>>>
>>> Sure. I meant comparing linkrevs, not linkrev vs (changelog) rev. If both
>>> arms were linkrevs, and if the limit pointed to the revision where the
>>> linkrevs of its files are guaranteed to be the lowest, the comparison
>>> (e.g. f.linkrev() < repo[limit][path].linkrev()) would work.
>>>
>>> My question was whether it would make things fast or not.
>>
>> Ha, I understand your proposal better now.
>>
>> My testing of the version in this patch versus the "wrong" version using
>> linkrev too agressively did not show a significant difference (-1%).
> 
> Do you mean only 1% speedup or measurable speedup on 1% cases? I don't think
> the former is significant, but the latter could be depending on use cases.

only ±1% speedup for all (many) cases I looked at.

>> So
>> I don't it to raise anything better. Can we move forward with this
>> series and maybe try a different approach later ?
> 
> Better to leave the _findlimit() function? Or won't it be any useful in
> your future work?

That seems easy enough t reintroduce later if needed. My current 
optimization work focus on the changeset based copies tracing algorithm. 
I don't think I'll spend much time on the filelog based one.

-- 
Pierre-Yves David