[PATCH 1 of 6 V2] obsstore: add a 'cachekey' method

Wed Jun 7 13:04:22 EDT 2017

On 06/07/2017 03:57 PM, Jun Wu wrote:
> Excerpts from Pierre-Yves David's message of 2017-06-07 15:27:41 +0100:
>> I'm not advocating for introducing dead code. The obsstore
>> indexes(radix-link) are a cache layer to the obsstore. As a cache, the
>> indexes need a cache-key and associated logic for incremental update.
>
> The problem I see for your cache-key approach is it has to have length
> information to be able to do incremental update. So it's not a typical cache
> key.

I'm not sure what you mean here. The cache key used to validate the 
cache and detect changes is only:

   <size-of-obsstore, hash-from-content>

This is based on the same things as the cache key you uses for indexes 
(size of obsstore); with an additional content checks (hash) in my proposal.

As of this series was written, directly obtaining markers from an (old, 
new) offset of the obsstore is not possible. The use of "number of 
obsmarkers" by the abstract cache is here to work around this. One of 
the goal of having the code in core was to follow up with a rework of 
the obsmarkers reading API to allow such partial read and drop the 
"number of markers" hack. There a comment point this out in the code I sent.

I see that you have done in your indexes series all the refactoring that 
I needed (Mostly patches 5,6 and 12). Thanks you for that.

(Note: evolve grown a hacky version of this some weeks back and is not 
longer using the "number markers" in practice:

https://www.mercurial-scm.org/repo/evolve/file/85cabd631a1b/hgext3rd/evolve/obscache.py#l130
)

In addition, I'm not sure what you mean by "not typical". Most cache in 
Mercurial have "tiprev" as part of the cachedkey and uses to compute 
their incremental update.

> I'd prefer the UUID + length approach. We have "length" implemented today.

The UUID approach requires breaking append-onlyness or using multiple 
file. It is less friendly with to (including wide transaction rollback) 
and races are gets annoying to avoid.

The append-only based approach do not suffer from these drawback making 
it more appealing.

>> What I need here is to have this logic "standard" and reusable by other
>> caches. I can split the obsstore specific part from the changelog part
>> and make it available, either used by the obscache or by the indexes
>>
>> note: current indexes cache key is fragile (file length check only) and
>> would benefit from the stronger cachekey+logic) available in this series.
>
> I'm not sure checking 200 bytes (in your current approach) is not fragile.

It is less fragile. Obsmarkers created in the same repo has a high odd 
to have the same size, the bytes hashing is catching content change.
We can increase the number of bytes we read a bit to reinforce it if 
this turn out to be too low.

See my other email for idea on how to make it more complete:

https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-June/099317.html

Cheers,

-- 
Pierre-Yves David