[PATCH 1 of 6 V2] obsstore: add a 'cachekey' method

Pierre-Yves David pierre-yves.david at ens-lyon.org
Sat Jun 3 22:50:22 EDT 2017



On 06/04/2017 01:24 AM, Pierre-Yves David wrote:
>
>
> On 06/03/2017 11:23 PM, Jun Wu wrote:
>> Excerpts from Pierre-Yves David's message of 2017-06-03 12:12:55 +0200:
>>> Indexes are definitely a good thing and we will want some at some point.
>>>
>>> However, the radix tree series on the list an early RFC[1] while this
>>> obscache series is a port from the evolve extension of code in the
>>> running in many places for over a month.
>>
>> I think I'm nearly completing it. As a side effect I did many small
>> cleanups
>> in obsolete.py
>>
>>> There are various things to clarify and adjust from that RFC series (eg:
>>> collaboration with the transaction) before we can consider taking it.
>>>
>>> So to me the way forward here is to get the existing working solution in
>>> now and incorporates the other one later. As a bonus point, they will
>>> have fairly similar logic for cache validation and incremental upgrade.
>>> So work from obscache will benefit the work on the indexes. If the
>>> obscache happens to be "obsoleted" by something new and better (eg:
>>> radix tree?), we can drop the obscache at that point. This just happens
>>> to the "hiddencache" which got recently dropped since hidden computation
>>> got much faster.
>>>
>>> In addition, it is unclear, we'll want to drop the obscache once the the
>>> indexes land. Timing wise, the RFC version (using some C), compute the
>>> obsolete set is 45.8ms vs 1.2ms (no extra C) for the obscaches[2]. This
>>
>> That 44ms data is outdated. My current version is even 10ms faster than
>> obscache for "hg id". I didn't expect it to be, though.
>
> There is likely something wrong with your timing is you get a 10ms speed
> up… since the obscache computation is about 1ms. How are you running
> your testing? Do you have your code available somewhere?

(note: with very basic change to _computeobsolete it looks like the 
overhead of from using nodes (and nodes lookup) is about 6x compared to 
rev indexing).

> (note: you probably have an outdated version on obscache too, I shaved
> it a bit in the past week (1.58ms → 0.95ms))
>
>> If the perf difference is within +/-20ms, I think it's fair to say this
>> 500-ish lines obscache implementation seem over complicated.
>
> The obscache logic itself is < 100 lines the rest are cache key +
> incremental upgrade logic common to other things. The changelog/revision
> part is pretty standard and we could factor it out with other similar
> user (but is small enough than the value is small). The obsstore related
> cache key and upgrade is doing something similar to what the indexes
> approach needs so there should be no significant difference in size and
> complexity.


That get me to think a bit more about that part, trying to pin point the 
apparent misunderstanding about the discussion around this series. And I 
think I got it :-)

Overall this series adds:

1) a generic cache key for the obsstore, to be used by any caches

2) an abstract class using the above key to build any kind of caches 
that just need to implement a "update yourself with these new markers" + 
storage. (That class is currently combined with a changelog related key, 
see below)

3) a concrete cache using the above(bytearray to cache a small property)

4) all the logic around transaction to have that cache updated and 
flushed properly.


In the above list, points (1), (2) and (3) are -also- needed by the 
indexes cache and would help its implementation. Dropping the obscache 
in favor of the indexes means dropping a small part of the above series 
only, the rest can be kept around (or updated in place to access a 
different cache with the same API). This highlight that the two series 
are not competing, they are complementary.

The incremental update and race condition prevention logic is quite 
important and the version in this series has been battle tested for a month.

In addition, the version of the abstract class in this series is kept 
simple for clarity purposes. However there are many improvement to be 
made to it. Some are already implemented in the evolve extension, some 
needs works into core (I've seen some of the needed bits done by Jun in 
his RFC). Such improvement will benefit all obsstore related cache 
(including the one in the evolve extension). I've been looking forward 
to implement such improvement for a couple of weeks now :-/

---

About the changelog bit in "dualsourcecache". The abstract class use a 
key checking both the changelog (like branchmap, branchrev, tagnode, 
etc) and the obsstore. It does so because my current two concrete cache 
(obscache and obshashrange-cache) need use these two items. We could 
easily split the obsstore part out for cache that rely on the obsstore 
content only (eg: the jun radix tree). Having that logic right and 
easily reusable it a critical point to make progesss on evolution. If 
that help I can follow up making that logic usable outside of 
dualsourcecache. That would let me work on improving that abstract base 
class and allow indexes to reuse it, making that work easier.

Cheers,

>[…]

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list