[PATCH 02 of 11] scmutil: add filecache, a smart property-like decorator that compares stat info

Tue Jul 19 05:23:06 CDT 2011

On Tue, Jul 19, 2011 at 1:26 AM, Adrian Buehlmann <adrian at cadifra.com>
wrote:
>
> On 2011-07-18 23:29, Matt Mackall wrote:
> > On Mon, 2011-07-18 at 22:32 +0200, Adrian Buehlmann wrote:
> >> On 2011-07-18 22:12, Matt Mackall wrote:
> >>> On Sat, 2011-07-16 at 18:03 +0200, Adrian Buehlmann wrote:
> >>>> On 2011-07-16 16:34, Idan Kamara wrote:
> >>>>> # HG changeset patch
> >>>>> # User Idan Kamara <idankk86 at gmail.com>
> >>>>> # Date 1310227619 -10800
> >>>>> # Node ID b99305dd59279aec962e23da2a362e0d8b785965
> >>>>> # Parent  d36f5aec2f9e4214fafe048bccd0bb47ac5f9c16
> >>>>> scmutil: add filecache, a smart property-like decorator that
compares stat info
> >>>>>
> >>>>> The idea is being able to associate a file with a property, and
watch
> >>>>> that file stat info for modifications when we decide it's important
for it to
> >>>>> be up-to-date. Once it changes, we recreate the object.
> >>>>>
> >>>>> As a consequence, localrepo.invalidate() will become much less
expensive in the
> >>>>> case where nothing changed on-disk.
> >>>>>
> >>>>> diff -r d36f5aec2f9e -r b99305dd5927 mercurial/scmutil.py
> >>>>> --- a/mercurial/scmutil.py        Sat Jul 16 15:30:43 2011 +0300
> >>>>> +++ b/mercurial/scmutil.py        Sat Jul 09 19:06:59 2011 +0300
> >>>>> @@ -709,3 +709,41 @@
> >>>>>          raise error.RequirementError(_("unknown repository format:
"
> >>>>>              "requires features '%s' (upgrade Mercurial)") % "',
'".join(missings))
> >>>>>      return requirements
> >>>>> +
> >>>>> +class filecache(object):
> >>>>> +    '''A property like decorator that tracks a file under .hg/ for
updates.
> >>>>> +
> >>>>> +    Records stat info when called in _invalidatecache.
> >>>>> +
> >>>>> +    On subsequent calls, compares old stat info with new info, and
recreates
> >>>>> +    the object when needed, updating the new stat info in
_invalidatecache.'''
> >>>>> +    def __init__(self, path, instore=False):
> >>>>> +        self.path = path
> >>>>> +        self.instore = instore
> >>>>> +
> >>>>> +    def __call__(self, func):
> >>>>> +        self.func = func
> >>>>> +        self.name = func.__name__
> >>>>> +        return self
> >>>>> +
> >>>>> +    def __get__(self, obj, type=None):
> >>>>> +        path = self.instore and obj.sjoin(self.path) or
obj.join(self.path)
> >>>>> +
> >>>>> +        if self.name in obj._invalidatecache:
> >>>>> +            cacheentry = obj._invalidatecache[self.name]
> >>>>> +            stat = util.stat(path)
> >>>>> +
> >>>>> +            if stat != cacheentry[1]:
> >>>>> +                cacheentry[1] = stat
> >>>>> +                result = cacheentry[0] = self.func(obj)
> >>>>> +            else:
> >>>>> +                result = cacheentry[0]
> >>>>> +        else:
> >>>>> +            # stat -before- reading so our cache doesn't lie if
someone
> >>>>> +            # modifies between the time we read+stat it
> >>>>> +            stat = util.stat(path)
> >>>>> +            result = self.func(obj)
> >>>>> +            obj._invalidatecache[self.name] = [result, stat, path]
> >>>>> +
> >>>>> +        setattr(obj, self.name, result)
> >>>>> +        return result
> >>>>
> >>>> What happens if the file changed its contents without changing mtime
nor
> >>>> size?
> >>>
> >>> Excellent question. Answer: we lose.
> >>
> >> ..
> >>
> >>> We need to cache and compare -the whole stat result-. There's
absolutely
> >>> no reason not to here.
> >>
> >> How does that solve the problem of missing a file change that changes
> >> file contents without changing size nor mtime? (and thus failing to
call
> >> func again)
> >
> > We've got three buckets we can dump filesystems into:
> >
> > have subsecond timestamps (eg NTFS, Btrfs, ext4..):
> >   changes are detected by comparing timestamps
> > have inodes (ext3, HFS+):
> >   changes made by non-append operations are made atomic rename
> >   and result in timestamp changes
> > neither (eg VFAT):
> >   similar issues (and solutions) to dirstate apply
>
> And good luck with Windows shares.
>
> Anyway, I don't think this will work. This code is trying to be too
clever.
>
> And discovering all the cases where it fails will be very hard.

The current plan is to trust the cache when we have inode info or
subsecond precision.

If it doesn't we will (for now) always reread the file. So filesystems that
don't have that
information, won't gain anything for now.

Later on we can optimize it like Matt explained yesterday on IRC, if we also
add the time
the file was read to the equation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20110719/ea22c269/attachment.html>