[PATCH 5 of 6 frozen-repos] localrepo: evaluate revsets against frozen repos

Mon Nov 23 01:07:38 CST 2015

On Sun, Nov 22, 2015 at 10:44 PM, Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 11/22/2015 10:04 PM, Gregory Szorc wrote:
>
>> On Sun, Nov 22, 2015 at 9:25 PM, Pierre-Yves David
>> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
>> wrote:
>>
>>
>>
>>     On 11/21/2015 05:14 PM, Gregory Szorc wrote:
>>
>>         # HG changeset patch
>>         # User Gregory Szorc <gregory.szorc at gmail.com
>>         <mailto:gregory.szorc at gmail.com>>
>>
>>         # Date 1448146482 28800
>>         #      Sat Nov 21 14:54:42 2015 -0800
>>         # Node ID ad891b362564b30ec69bc844a3fd98e73b6e032e
>>         # Parent  888c2171adffa8340406b50aae02375f7bef50f4
>>         localrepo: evaluate revsets against frozen repos
>>
>>         Previously, revsets were evaluated against a
>>         repository/changelog that
>>         could change. This felt wrong. And, changectx lookups during
>> revset
>>         evaluation would result in repoview constantly performing
>> changelog
>>         correctness checks, adding overhead.
>>
>>         This patch results in some significant performance wins,
>> especially
>>         when changectx are involved. There are some minor regressions, but
>>         the absolute time increase is so small that they can arguably be
>>         ignored. A detailed analysis follows.
>>
>>         Running the revset benchmarks in default mode of returning
>> integers,
>>         we see some interesting changes:
>>
>>         revset #1: draft()
>>              plain
>>         0) 0.000040
>>         1) 0.000053 132%
>>
>>              plain
>>         0) 0.000233
>>         1) 0.000236
>>
>>         revset #7: author(lmoscovicz)
>>              plain
>>         0) 0.994968
>>         1) 0.702156  70%
>>
>>         revset #8: author(mpm)
>>              plain
>>         0) 0.982039
>>         1) 0.696124  70%
>>
>>         revset #9: author(lmoscovicz) or author(mpm)
>>              plain
>>         0) 1.944505
>>         1) 1.372315  70%
>>
>>         revset #10: author(mpm) or author(lmoscovicz)
>>              plain
>>         0) 1.970464
>>         1) 1.393157  70%
>>
>>         revset #13: roots((tip~100::) - (tip~100::tip))
>>              plain
>>         0) 0.000636
>>         1) 0.000603  94%
>>
>>         revset #15: 42:68 and roots(42:tip)
>>              plain
>>         0) 0.000226
>>         1) 0.000178  78%
>>
>>         revset #19: draft()
>>              plain
>>         0) 0.000040
>>         1) 0.000056 140%
>>
>>         revset #22: (not public() - obsolete())
>>              plain
>>         0) 0.000088
>>         1) 0.000111 126%
>>
>>         revset #23: (_intlist('20000\x0020001')) and merge()
>>              plain
>>         0) 0.000066
>>         1) 0.000086 130%
>>
>>         First, the improvements. revsets with author() improved
>>         significantly.
>>         The reason is that unlike most changesets, author() needs to
>>         obtain a
>>         changectx to inspect the author field. And since repo.changelog
>>         lookups
>>         are faster, that revset function became faster.
>>
>>
>>     Using changectx in revset is too slow anyway. We should use lower
>>     lever access there anyway (as we just changed for _matchfiles).
>>
>>     I'll look at the rest of this later.
>>
>>
>> Keep in mind that changectx instances don't read from the revlog until
>> there is a data access that requires a read. revsets that access things
>> like the DAG "shape" are powered strictly from the index, which is
>> insanely fast. The only time changectx object overhead comes into play
>> is when accessing the commit message, date, author, changed files list,
>> extras, etc.
>>
>
> I know, I think we should stop using changectx for accessing theses too.
>

That's fine if you want to do that. But I don't think it will matter for
perf that much unless you rewrite changelog.read() in C and/or do things
lazily decode unicode values and extras. What's nice about changectx is it
already has cached properties on it. So in theory it would be pretty easy
to make decoding lazy. I started this work in London and think I had some
minorimprovements to show from it. I should look at those patches again
since `hg log` is a bit faster now.

I spent a fair amount of time trying to optimize vanilla `hg log` earlier
today. What really surprised me was that util.datestr() is a hot spot. 10%
or something like that. Specifically the lines where we .replace('%1') and
the call to strftime(). I half thought about trying to implement the
formatting of the default date string in C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151122/0ce1cd25/attachment.html>