[PATCH 02 of 11] scmutil: add filecache, a smart property-like decorator that compares stat info

Tue Jul 19 06:53:14 CDT 2011

On 2011-07-19 12:23, Idan Kamara wrote:
> On Tue, Jul 19, 2011 at 1:26 AM, Adrian Buehlmann <adrian at cadifra.com
> <mailto:adrian at cadifra.com>> wrote:
>>
>> On 2011-07-18 23:29, Matt Mackall wrote:
>> > On Mon, 2011-07-18 at 22:32 +0200, Adrian Buehlmann wrote:
>> >> On 2011-07-18 22:12, Matt Mackall wrote:
>> >>> On Sat, 2011-07-16 at 18:03 +0200, Adrian Buehlmann wrote:
>> >>>> On 2011-07-16 16:34, Idan Kamara wrote:
>> >>>>> # HG changeset patch
>> >>>>> # User Idan Kamara <idankk86 at gmail.com <mailto:idankk86 at gmail.com>>
>> >>>>> # Date 1310227619 -10800
>> >>>>> # Node ID b99305dd59279aec962e23da2a362e0d8b785965
>> >>>>> # Parent  d36f5aec2f9e4214fafe048bccd0bb47ac5f9c16
>> >>>>> scmutil: add filecache, a smart property-like decorator that
> compares stat info
>> >>>>>
>> >>>>> The idea is being able to associate a file with a property, and
> watch
>> >>>>> that file stat info for modifications when we decide it's
> important for it to
>> >>>>> be up-to-date. Once it changes, we recreate the object.
>> >>>>>
>> >>>>> As a consequence, localrepo.invalidate() will become much less
> expensive in the
>> >>>>> case where nothing changed on-disk.
>> >>>>>
>> >>>>> diff -r d36f5aec2f9e -r b99305dd5927 mercurial/scmutil.py
>> >>>>> --- a/mercurial/scmutil.py        Sat Jul 16 15:30:43 2011 +0300
>> >>>>> +++ b/mercurial/scmutil.py        Sat Jul 09 19:06:59 2011 +0300
>> >>>>> @@ -709,3 +709,41 @@
>> >>>>>          raise error.RequirementError(_("unknown repository
> format: "
>> >>>>>              "requires features '%s' (upgrade Mercurial)") % "',
> '".join(missings))
>> >>>>>      return requirements
>> >>>>> +
>> >>>>> +class filecache(object):
>> >>>>> +    '''A property like decorator that tracks a file under .hg/
> for updates.
>> >>>>> +
>> >>>>> +    Records stat info when called in _invalidatecache.
>> >>>>> +
>> >>>>> +    On subsequent calls, compares old stat info with new info,
> and recreates
>> >>>>> +    the object when needed, updating the new stat info in
> _invalidatecache.'''
>> >>>>> +    def __init__(self, path, instore=False):
>> >>>>> +        self.path = path
>> >>>>> +        self.instore = instore
>> >>>>> +
>> >>>>> +    def __call__(self, func):
>> >>>>> +        self.func = func
>> >>>>> +        self.name <http://self.name> = func.__name__
>> >>>>> +        return self
>> >>>>> +
>> >>>>> +    def __get__(self, obj, type=None):
>> >>>>> +        path = self.instore and obj.sjoin(self.path) or
> obj.join(self.path)
>> >>>>> +
>> >>>>> +        if self.name <http://self.name> in obj._invalidatecache:
>> >>>>> +            cacheentry = obj._invalidatecache[self.name
> <http://self.name>]
>> >>>>> +            stat = util.stat(path)
>> >>>>> +
>> >>>>> +            if stat != cacheentry[1]:
>> >>>>> +                cacheentry[1] = stat
>> >>>>> +                result = cacheentry[0] = self.func(obj)
>> >>>>> +            else:
>> >>>>> +                result = cacheentry[0]
>> >>>>> +        else:
>> >>>>> +            # stat -before- reading so our cache doesn't lie if
> someone
>> >>>>> +            # modifies between the time we read+stat it
>> >>>>> +            stat = util.stat(path)
>> >>>>> +            result = self.func(obj)
>> >>>>> +            obj._invalidatecache[self.name <http://self.name>]
> = [result, stat, path]
>> >>>>> +
>> >>>>> +        setattr(obj, self.name <http://self.name>, result)
>> >>>>> +        return result
>> >>>>
>> >>>> What happens if the file changed its contents without changing
> mtime nor
>> >>>> size?
>> >>>
>> >>> Excellent question. Answer: we lose.
>> >>
>> >> ..
>> >>
>> >>> We need to cache and compare -the whole stat result-. There's
> absolutely
>> >>> no reason not to here.
>> >>
>> >> How does that solve the problem of missing a file change that changes
>> >> file contents without changing size nor mtime? (and thus failing to
> call
>> >> func again)
>> >
>> > We've got three buckets we can dump filesystems into:
>> >
>> > have subsecond timestamps (eg NTFS, Btrfs, ext4..):
>> >   changes are detected by comparing timestamps
>> > have inodes (ext3, HFS+):
>> >   changes made by non-append operations are made atomic rename
>> >   and result in timestamp changes
>> > neither (eg VFAT):
>> >   similar issues (and solutions) to dirstate apply
>>
>> And good luck with Windows shares.
>>
>> Anyway, I don't think this will work. This code is trying to be too
> clever.
>>
>> And discovering all the cases where it fails will be very hard.
> 
> The current plan is to trust the cache when we have inode info or
> subsecond precision.
> 
> If it doesn't we will (for now) always reread the file. So filesystems
> that don't have that
> information, won't gain anything for now.

Ok. That sounds better.

On Windows, we might also use FileIndex
http://msdn.microsoft.com/en-us/library/aa363788(v=vs.85).aspx
http://hg.intevation.org/mercurial/crew/file/647071c6dfcf/mercurial/win32.py#l44

which resembles inode's. If the FileIndex has changed, we can infer that
the file has changed.

Would be nice if we could eventually use subseconds inside .hg/dirstate
as well (or in whatever file that higher resolution info would be saved).

> Later on we can optimize it like Matt explained yesterday on IRC, if we
> also add the time
> the file was read to the equation.

Uh. More complicated tricks...