[PATCH] localrepo: introduce persistent caching of revset revision's branch names

Wed Oct 15 07:36:11 CDT 2014

On 10/15/2014 01:47 PM, Pierre-Yves David wrote:
>
>
> On 10/15/2014 04:40 AM, Mads Kiilerich wrote:
>> On 10/15/2014 02:45 AM, Pierre-Yves David wrote:
>>>
>>>
>>> On 10/14/2014 05:33 PM, Mads Kiilerich wrote:
>>>> # HG changeset patch
>>>> # User Mads Kiilerich <madski at unity3d.com>
>>>> # Date 1413333190 -7200
>>>> #      Wed Oct 15 02:33:10 2014 +0200
>>>> # Node ID 85189122b49b31dddbcddb7c0925afa019dc4403
>>>> # Parent  48c0b101a9de1fdbd638daa858da845cd05a6be7
>>>> localrepo: introduce persistent caching of revset revision's branch
>>>> names
>>>>
>>>> It is expensive to create a changectx and extract the branch name.
>>>> That shows
>>>> up when filtering on branches in revsets.
>>>>
>>>> To speed things up, cache the results on disk. To avoid using too
>>>> much space,
>>>> all branch names are only stored once and each revision references
>>>> one of these
>>>> names. To verify that the cache is valid, we also store the tip hash
>>>> in the
>>>> cache file.
>>>
>>> I cannot see what is your invalidation strategy here.
>>
>> ??? It is right there. Obviously, if the cache not is valid it is
>> invalid. But I must assume you are looking for something else. What?
>>
>> Or to rephrase it: When writing the cache file, it writes the tip
>> revision number (+1) and the hash of tip. When reading the file, if that
>> revision still has the same hash (even if it no longer is tip), the
>> cache is considered valid as far as it goes.
>
> Hum, standard blindness here, my bad… This cache strategy is 
> vulnerable to operation (or chain of operation) that change repository 
> content without altering the hash number of commit or its hash. We 
> should probably use something stronger.

Yes.

Something like a linearly chained hash of the hashes comes to mind. It 
would however be too slow to validate the whole chain when accessing the 
cache, so it would be vulnerable to the same problems unless it has good 
low level support for invalidation and recomputation. And then, why not 
just invalidate/truncate caches at that point.

Something vector clock or bloom filter could perhaps also be used. A 
simpler solution would probably be an ever-increasing counter that are 
incremented on commits/changegroup additions _and_ on strip (and amend).

> I know that this is used for branchmap as well. So yes branchmap is 
> vulnerable too. But branchmap is less vulnerable because it does not 
> depends of the order of the revision in the repo (revnum) while you do 
> here. We should build something stronger here.

The current branchmap validation scheme can just as easily be tricked if 
the two tipmost revs are unrelated, both are stripped, a new evil commit 
is made on some branch, and then the old tip is pulled back in.

A stronger solution would be nice. When we get it we can apply the same 
solution everywhere. Until we get the "cannot break" solution, I think 
it is ok to extend the use of the existing imperfect but "doesn't break" 
solution.

/Mads