[PATCH 1 of 6 V2] obsstore: add a 'cachekey' method

Sat Jun 3 17:23:09 EDT 2017

Excerpts from Pierre-Yves David's message of 2017-06-03 12:12:55 +0200:
> Indexes are definitely a good thing and we will want some at some point.
> 
> However, the radix tree series on the list an early RFC[1] while this 
> obscache series is a port from the evolve extension of code in the 
> running in many places for over a month.

I think I'm nearly completing it. As a side effect I did many small cleanups
in obsolete.py

> There are various things to clarify and adjust from that RFC series (eg: 
> collaboration with the transaction) before we can consider taking it.
> 
> So to me the way forward here is to get the existing working solution in 
> now and incorporates the other one later. As a bonus point, they will 
> have fairly similar logic for cache validation and incremental upgrade. 
> So work from obscache will benefit the work on the indexes. If the 
> obscache happens to be "obsoleted" by something new and better (eg: 
> radix tree?), we can drop the obscache at that point. This just happens 
> to the "hiddencache" which got recently dropped since hidden computation 
> got much faster.
> 
> In addition, it is unclear, we'll want to drop the obscache once the the 
> indexes land. Timing wise, the RFC version (using some C), compute the 
> obsolete set is 45.8ms vs 1.2ms (no extra C) for the obscaches[2]. This 

That 44ms data is outdated. My current version is even 10ms faster than
obscache for "hg id". I didn't expect it to be, though.

If the perf difference is within +/-20ms, I think it's fair to say this
500-ish lines obscache implementation seem over complicated.

> is a big save on the startup time.In the same repository (no extension 
> but evolve-obscache), hg version take 80ms seconds. 44ms is over 50% of 
> that. For a more practical command `hg diff -r .^` is 170ms, so tens of 
> millisecond will be tens of percent slow down in impact for fast command 
> (and we have a small amount of obsolete revisions: about 6K).
> The obscache is using revision indexing, this mean no rev to node 
> conversion and a more direct data access. To compute the set of obsolete 
> changeset (part of startup) this meant it will remains significantly 
> faster than an indexes approach. We independently want the indexes 
> approach to solve other issue.
> 
> Finally, there are other evolve related area were we will want caches, 
> this series install the necessary infrastructure to easily add such 
> caches. For example obsolescence-markers-discovery is an area were we 
> know we'll need caching using this kind of infrastructure to detect and 
> perform update. In addition, various troubles[3] are a bit complex to 
> compute, I expect us to want to keep them in a cache too. Some of that 
> logic can even be reused for the radix tree implementation.

As I said in another thread, I think a better long term direction is to make
related revset lazy. People usually run `hg log -r "small-set"`, they won't
care about things not being outputted. Today when I run "hg log -r .", hg
checks 5k+ commits for obsolete revset, I don't think we want to keep that
behavior forever.

If we go that "lazy" direction, the cache approach seems unnecessary in
general.

I'm not sure about obsolescence-markers-discovery. But I do have some ideas
in that area too. I guess we have to see the actual logic in that area to
judge whether dualsourcecache is worthwhile or not.

> 
> Cheers,
>