[PATCH 2 of 5] adjustlinkrev: use C implementation to test ancestor if possible

Sun Nov 6 19:08:18 EST 2016

On 11/02/2016 03:05 AM, Jun Wu wrote:
> Excerpts from Pierre-Yves David's message of 2016-11-02 00:29:45 +0100:
>>
>> On 11/01/2016 09:51 AM, Jun Wu wrote:
>>> # HG changeset patch
>>> # User Jun Wu <quark at fb.com>
>>> # Date 1477989396 0
>>> #      Tue Nov 01 08:36:36 2016 +0000
>>> # Node ID 93e073edc7f673d76d1113f5830ec46210707c25
>>> # Parent  ac063914b3a3c01f1d7ed253c73113903fccb7a9
>>> # Available At https://bitbucket.org/quark-zju/hg-draft
>>> #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 93e073edc7f6
>>> adjustlinkrev: use C implementation to test ancestor if possible
>>>
>>> The C ancestors implementation is about 10-100x faster than the Python
>>> lazyancestors implementation (and it has room to improve: 1. we only need
>>> isancestor, not common ancestor; 2. we can shrink revs[p:q] to a single
>>> ranged revision if all(revs[i].parentrevs() == [i-1] for i in range(p,q)),
>>> the state of known ranged revisions can be stored in the C index object).
>>>
>>> In the real world, the following command:
>>>
>>>   HGRCPATH=/dev/null hg log -f mercurial/c*.py -T '{rev}\n' > /dev/null
>>>
>>> Takes 2.3 to 2.5 seconds user-space CPU time before the patch,
>>>   and 1.7 to 1.9 after.
>
> I'd prefer commenting on details directly. Things like "this is a complex
> issue, you should try to do more homework before trying to solve it" are
> useful to new people, while not very helpful to me, who had some experience
> here. I think you can save your time by simplifying most part of this email
> to:
>
>   - Could you include runtime data for the Mozilla graft case?
>   - Could you include runtime data for pure and pypy?
>   - How do you think about a C implementation for lazyancestors?

So, this not just an usual "This is non trivial - please include more 
details" feedback. From experience, I know this piece of code is 
especially full of monsters and dragons. When this code happened we had 
to multiple severe performance regressions reported during the freeze 
and after it, and fixing one of them usually made another one appears in 
its wake. This was bad for the project because it impacted user and had 
to issue bugfix releases and this was bad for the contributors because I 
remember having to context switch at 1 am on a release night attending 
FOSDEM conference to fix yet another regression at was introduced by the 
previous set batch of fixes, I wasn't happy. I want the project to avoid 
this again and I don't wish you to have to battle with unexpected 
consequence.

But thanks for expressing this general feeling anyway, I'll try to adapt 
and adjust feedback in a bit more direct way for other series.

>> There is an history of introducing large performance regression while
>> touching that code. That explain a bit its complexity, for example there
>> is case were we carry the direct parent in _descendantrev and adjust
>> linkrev as we go while there is some case where we just set the top most
>> revision in _descendantrev and adjust link on demand much later. These
>> serve different purposes for different top level use case.
>
> I think I fully understand what the current _adjustlinkrev code is doing and
> why it is the form today.

I don't doubt you have are getting a good understanding of the current 
code, but message like this one 
https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-November/089815.html 
show that you do not have a full grasp on the best yet.

> Instead of running tests blindly, I'd like to measure things more
> scientifically, like analyzing the code flow, the time complexity etc.
> For this patch, if C code works, it should be faster in all cases. Reasons:
>
>   - If _ancestrycontext exists, the code path is the same for both the new
>     and the old code, so there should be no difference. This covers cases
>     where copies.py sets _ancestrycontext explicitly.
>   - If _ancestrycontext does not exist, "memberanc" will be created on the
>     fly by "_adjustlinkrev" and dropped when returning. Therefore no state
>     about which ancestor revisions are already visited is stored. It's
>     walking from _descendantrev or self.rev each time. The C code is doing a
>     same stateless thing but is 10-100x faster.
>
> I can update the commit message to include the above reasons.

Yes please at least update the commit message next time. This was 
helpful to get a better grasp on what your changeset is doing.

> As I'm lacking of time, probably like everyone else here, I'd defend my time
> to be used on something more meaningful (than unnecessary tests here). So I
> can end up completing more things here, which is a good thing.

As pointed earlier in this reply, last time we installed this code, this 
ended up much more impactful and timewasting than anticipated. I'm 
trying to get everybody to eventually spend less time on this here. This 
means moving carefully and having a lot of initial check so that we 
don't have to come back on this.

> That said, pure / pypy is a valid concern. The problem is, is
> "ancestor.commonancestorsheads" slower than "ancestor.lazyancestors"?
> I will answer this question later.
>
>> The multiple cases we found should be available on the tracked and the
>> changesets touching this should be quite well documented with for
>> example details on the operation and repository used for timing.
>> Unfortunatly as we do not have a formal benchmark suite, they have not
>> been recorded more formally. I greatly recommend your dig all that out
>> before proceeding (As you've already started doing so in your reply).
>> For example, the changeset you point out, 9372180b8c9b, is fixing an
>> issue that was not visible on the Mercurial repository but was affecting
>> the Mozilla repository dramatically.
>
> I was aware of issue4537 and issue4514 before writing the series. I did
> run the commands commented by Mathias on issue4514.

Please make a run through all histories touching this area between 2014 
January 1st and mid march searching for update to the code and related 
corner case. I've vague memory of at least 3 major updates related to 
performances.

> > It probably make sense to build a proper list of all problematic case,
>> link them to benchmark and record timing for these operation on all
>> milestone (pre-adjust, with-current-adjust, with-your-change). Having
>> such list would help use being confident in the direction we will move in.
>>
>> All this performance related work need also to be measure in term of
>> impact of user of the pure code, including with pypy.
>
> This is a valid concern.
>
>> That said, Have you consider just writing a C version of the
>> lazy-ancestors code? It would probably provide similar speed up while
>> proving useful for many other part of the code.
>
> I had considered. It's not very compelling now because it is complex
> (re-inventing heapq and memory management can be painful), requires extra
> time to complete, while the current C code seems to be good enough even for
> internal big repos.

What about an intermediate version that use C code to fill a Python heap 
object? A good way to get a good grasp on the boost I think having a 
small Cython proof of concept would help.

> I'd note that I think the C code may have room to improve. And if we are
> going to add some "state" to speed up testing ancestors, I think it should
> be done globally, i.e. on the C index object. So the state is private to the
> C code and the Python code just calls index.isancestor, without knowing
> about the details. As the changelog is shared globally and it may be a waste
> of memory to have separated states for each fctx.

I'm a bit afraid of the space complexity of such cache. However, one of 
you colleague at the sprint some idea around this.

>>> diff --git a/mercurial/context.py b/mercurial/context.py
>>> --- a/mercurial/context.py
>>> +++ b/mercurial/context.py
>>> @@ -838,9 +838,14 @@ class basefilectx(object):
>>>          else:
>>>              revs = [srcrev]
>>> +        # check if this linkrev is an ancestor of srcrev
>>>          if memberanc is None:
>>> -            memberanc = iteranc = cl.ancestors(revs, lkr,
>>> -                                               inclusive=inclusive)
>>> -        # check if this linkrev is an ancestor of srcrev
>>> -        if lkr not in memberanc:
>>> +            # cl.isancestor is backed by C code, faster than cl.ancestors
>>> +            if any(cl.isancestor(lkr, r) for r in revs):
>>> +                return lkr
>>> +        else:
>>> +            if lkr in memberanc:
>>> +                return lkr
>>> +        # fallback to walk through the changelog
>>> +        if True:
>>>              if iteranc is None:
>>>                  iteranc = cl.ancestors(revs, lkr, inclusive=inclusive)

I think the next step here should be to list all problematic case we 
initially encountered and make sure we have a good test-case for them.

-- 
Pierre-Yves David