[PATCH 1 of 6 V2] obsstore: add a 'cachekey' method

Sat Jun 3 06:12:55 EDT 2017

On 06/02/2017 03:10 PM, Yuya Nishihara wrote:
> On Thu, 1 Jun 2017 15:28:19 -0700, Jun Wu wrote:
>> Excerpts from Sean Farley's message of 2017-06-01 13:27:44 -0700:
>>>> Both series are complementary and useful. Obscache is very efficient for
>>>> its purpose and the other one improve other aspect of evolution related
>>>> computation. We already have many rev indexed caches so nothing really
>>>> new here.
>>>
>>> This was my thinking as well. Though, I'm not trying to muddy the waters
>>> here.
>>
>> There are 4 revsets: obsolete, unstable, bumped, and divergent. My patch
>> speeds up all of them significantly while obscache only speeds up the first
>> one.
>>
>> I admit that obscache is slightly faster than indexing on the "obsolete"
>> revset. But that perf difference would be like just 20ms for hg-committed. I
>> think that 20ms does not justify the complexity of a complete
>> (huge-repo-friendly) implementation obscache and the coupling with revision
>> number in design. And I do think eventually we want the related revset to be
>> lazy so there won't be noticeable perf difference.
>
> I haven't read these patches carefully, but I like Jun's radixlink idea which
> seemed clever. If the perf win is just a few tens of milliseconds, I prefer not
> having two separate caches layers that could complicate things in future.

Indexes are definitely a good thing and we will want some at some point.

However, the radix tree series on the list an early RFC[1] while this 
obscache series is a port from the evolve extension of code in the 
running in many places for over a month.
There are various things to clarify and adjust from that RFC series (eg: 
collaboration with the transaction) before we can consider taking it.

So to me the way forward here is to get the existing working solution in 
now and incorporates the other one later. As a bonus point, they will 
have fairly similar logic for cache validation and incremental upgrade. 
So work from obscache will benefit the work on the indexes. If the 
obscache happens to be "obsoleted" by something new and better (eg: 
radix tree?), we can drop the obscache at that point. This just happens 
to the "hiddencache" which got recently dropped since hidden computation 
got much faster.

In addition, it is unclear, we'll want to drop the obscache once the the 
indexes land. Timing wise, the RFC version (using some C), compute the 
obsolete set is 45.8ms vs 1.2ms (no extra C) for the obscaches[2]. This 
is a big save on the startup time.In the same repository (no extension 
but evolve-obscache), hg version take 80ms seconds. 44ms is over 50% of 
that. For a more practical command `hg diff -r .^` is 170ms, so tens of 
millisecond will be tens of percent slow down in impact for fast command 
(and we have a small amount of obsolete revisions: about 6K).
The obscache is using revision indexing, this mean no rev to node 
conversion and a more direct data access. To compute the set of obsolete 
changeset (part of startup) this meant it will remains significantly 
faster than an indexes approach. We independently want the indexes 
approach to solve other issue.

Finally, there are other evolve related area were we will want caches, 
this series install the necessary infrastructure to easily add such 
caches. For example obsolescence-markers-discovery is an area were we 
know we'll need caching using this kind of infrastructure to detect and 
perform update. In addition, various troubles[3] are a bit complex to 
compute, I expect us to want to keep them in a cache too. Some of that 
logic can even be reused for the radix tree implementation.

Cheers,

-- 
Pierre-Yves David

[1] after discussing with Jun on IRC,he seems to be poking at a second 
iteration.
[2] 
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-May/098348.html
[3] (unstability in the new vocabulary)