[PATCH 4 of 7] manifest: use specific opener to avoid file stat ambiguity around truncation

Pierre-Yves David pierre-yves.david at ens-lyon.org
Wed Sep 21 07:43:31 EDT 2016



On 09/21/2016 08:57 AM, FUJIWARA Katsunori wrote:
> At Tue, 20 Sep 2016 18:14:32 +0200,
> Pierre-Yves David wrote:
>>
>> On 09/16/2016 10:51 PM, FUJIWARA Katsunori wrote:
>>> # HG changeset patch
>>> # User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
>>> # Date 1474057496 -32400
>>> #      Sat Sep 17 05:24:56 2016 +0900
>>> # Node ID 55bd90d63ad3a76fd4d27d5eabb3fe9a7fa0642b
>>> # Parent  71b6b49f8a7ab6c894028b9153290f4bbf0f54f6
>>> manifest: use specific opener to avoid file stat ambiguity around truncation
>>>
>>> If steps below occurs at "the same time in sec", all of mtime, ctime
>>> and size are same between (1) and (3).
>>>
>>>   1. append data to 00manifest.i (and close transaction)
>>>   2. discard appended data by truncation (strip or rollback)
>>>   3. append same size but different data to 00manifest.i again
>>>
>>> Therefore, cache validation doesn't work after (3) as expected.
>>>
>>> To avoid such file stat ambiguity around truncation, this patch uses
>>> specific opener, which forces checkambig=True at writing 00manifest.i
>>> changes out.
>>>
>>> Hooking vfs.__call__() is the only way to centralize changes into
>>> manifest.py, because all logic to actually write in-memory changes out
>>> is implemented in revlog layer.
>>>
>>> In fact, reusing already wrapped self.opener for dirlogcache entries
>>> like as below is a little redundant, because storecache-ed properties
>>> of localrepository doesn't refer file stat of non-root 00manifest.i
>>> files (= ambiguity of such files doesn't cause any issue).
>>>
>>>         if dir not in self._dirlogcache:
>>>             self._dirlogcache[dir] = manifestrevlog(self.opener, dir,
>>>                                                     self._dirlogcache)
>>>
>>> But wrapped opener never checks ambiguity of file stat for non-root
>>> 00manifest.i files, because "path == '00manifest.i'" condition in
>>> wrapper.__call__() is always False for such files. Therefore, this
>>> patch simply reuses wrapped self.opener for non-root 00manifest.i
>>> files.
>>>
>>> Even after this patch, avoiding file stat ambiguity of 00manifest.i
>>> around truncation isn't yet completed, because truncation side isn't
>>> aware of this issue.
>>>
>>> This is a part of ExactCacheValidationPlan.
>>>
>>>     https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
>>>
>>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>>> --- a/mercurial/manifest.py
>>> +++ b/mercurial/manifest.py
>>> @@ -892,6 +892,28 @@ class treemanifest(object):
>>>                  subp1, subp2 = subp2, subp1
>>>              writesubtree(subm, subp1, subp2)
>>>
>>> +def _checkambigopener(opener):
>>> +    """build an opener to add checkambig=True at changing 00manifest.i
>>> +    """
>>> +    class wrapper(opener.__class__):
>>> +        def __call__(self, path, mode="r", text=False, atomictemp=False,
>>> +                     notindexed=False, backgroundclose=False, checkambig=False,
>>> +                      **kwargs):
>>> +            if path == '00manifest.i' and mode not in ('r', 'rb'):
>>> +                # check file stat ambiguity at closing forcibly
>>> +                checkambig = True
>>> +            supercall = super(wrapper, self).__call__
>>> +            return supercall(path, mode,
>>> +                             text=text,
>>> +                             atomictemp=atomictemp,
>>> +                             notindexed=notindexed,
>>> +                             backgroundclose=backgroundclose,
>>> +                             checkambig=checkambig,
>>> +                             **kwargs)
>>> +    opener.__class__ = wrapper
>>
>> This wrapper logic is a bit scary here. Could we  had the necessary
>> minimal hooks in revlog to allow manifest and changelog to hooks in ?
>> That would seems more robust/clean than wrapping core from core.
>
> How about adding optional argument "checkambig" to revlog.__init__(),
> of which default value is False ?\

Given that multiple subclass uses it it seems to make sense to go this 
way. It seems simple enough. The revlog class has other small 'extension 
point' for subclass so there is precedence.

Cheers,

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list