[PATCH 2 of 5] adjustlinkrev: use C implementation to test ancestor if possible

Thu Nov 10 10:02:26 EST 2016

On 11/08/2016 05:15 PM, Jun Wu wrote:
> Excerpts from Pierre-Yves David's message of 2016-11-08 17:28:21 +0100:
>>
>> On 11/07/2016 02:01 AM, Jun Wu wrote:
>>> Excerpts from Pierre-Yves David's message of 2016-11-07 01:08:18 +0100:
>>> […]
>>>> What about an intermediate version that use C code to fill a Python heap
>>>> object? A good way to get a good grasp on the boost I think having a
>>>> small Cython proof of concept would help.
>>>
>>> That might be helpful. However, I don't have immediate plans for it. Because
>>> we will use C code to do fast "isancestor" test for the "changelog.i" walk,
>>> and if we pre-build the linkrev database, there is no need to do
>>> "changelog.d" walk. So "lazyancestors" won't be actually used in
>>> "adjustlinkrev", which is my main focus right now.
>>
>> Can you setup a plan page for this linkrev database  project ?
>
> This is pretty simple and straightforward that probably does not need a plan
> page. The database maintains a map from (path, fnode) to [linkrev] (a list
> of possible linkrevs). The first version is being reviewed internally. A
> preview is at https://bpaste.net/show/8634205271dc . I'll send them upstream
> as Augie suggested once it's more polished, and will try to work with
> non-filelog case that Martin mentioned on IRC.

Plan page does not needs to be very complex,

So you current plan is to have this "(path, fnode) → [linkrev]" mapping 
stored into a sqlite database and update it on the fly when needed. What 
were your rational to pick this over a more complete cache of 
(changerev, path, filenode) exchanged over the wire ?

Out of curiosity, do you have speedup number for your current experiment ?

>>>>> I'd note that I think the C code may have room to improve. And if we are
>>>>> going to add some "state" to speed up testing ancestors, I think it should
>>>>> be done globally, i.e. on the C index object. So the state is private to the
>>>>> C code and the Python code just calls index.isancestor, without knowing
>>>>> about the details. As the changelog is shared globally and it may be a waste
>>>>> of memory to have separated states for each fctx.
>>>>
>>>> I'm a bit afraid of the space complexity of such cache. However, one of
>>>> you colleague at the sprint some idea around this.
>>>
>>> The memory usage of the global cache would be something like O(log(N) * N)
>>> or O(N), depending on how useful we want it to be.
>>
>> I'm not sure how you forsee this cache, can you elaborate ? A naive
>> persistent cache approach would give a O(N²) space.
>
> If I'm going to write the O(N^2) version, you can safely assume that I
> couldn't write fastannotate :)

;-) (that's why I asked)

>>> This is better than the "_ancestrycontext" approach - suppose every ctx has
>>> an "_ancestrycontext", it's N^2 memory usage, where N is len(repo).
>>
>> Except we don't do that right now. Especially, if we use many changectx
>> with a _ancestrycontext, it is likely to be the same object used for all
>> if I remember correctly.
>
> But that does not speed up changelog.d walk. And I doubt that the use of
> _ancestrycontext could lead to incorrect results in some cases if they are
> the same object for the chain.

The changelog.d walk is about looking for modified file, right ?

-- 
Pierre-Yves David