[PATCH 02 of 11] scmutil: add filecache, a smart property-like decorator that compares stat info
Adrian Buehlmann
adrian at cadifra.com
Tue Jul 19 06:53:14 CDT 2011
On 2011-07-19 12:23, Idan Kamara wrote:
> On Tue, Jul 19, 2011 at 1:26 AM, Adrian Buehlmann <adrian at cadifra.com
> <mailto:adrian at cadifra.com>> wrote:
>>
>> On 2011-07-18 23:29, Matt Mackall wrote:
>> > On Mon, 2011-07-18 at 22:32 +0200, Adrian Buehlmann wrote:
>> >> On 2011-07-18 22:12, Matt Mackall wrote:
>> >>> On Sat, 2011-07-16 at 18:03 +0200, Adrian Buehlmann wrote:
>> >>>> On 2011-07-16 16:34, Idan Kamara wrote:
>> >>>>> # HG changeset patch
>> >>>>> # User Idan Kamara <idankk86 at gmail.com <mailto:idankk86 at gmail.com>>
>> >>>>> # Date 1310227619 -10800
>> >>>>> # Node ID b99305dd59279aec962e23da2a362e0d8b785965
>> >>>>> # Parent d36f5aec2f9e4214fafe048bccd0bb47ac5f9c16
>> >>>>> scmutil: add filecache, a smart property-like decorator that
> compares stat info
>> >>>>>
>> >>>>> The idea is being able to associate a file with a property, and
> watch
>> >>>>> that file stat info for modifications when we decide it's
> important for it to
>> >>>>> be up-to-date. Once it changes, we recreate the object.
>> >>>>>
>> >>>>> As a consequence, localrepo.invalidate() will become much less
> expensive in the
>> >>>>> case where nothing changed on-disk.
>> >>>>>
>> >>>>> diff -r d36f5aec2f9e -r b99305dd5927 mercurial/scmutil.py
>> >>>>> --- a/mercurial/scmutil.py Sat Jul 16 15:30:43 2011 +0300
>> >>>>> +++ b/mercurial/scmutil.py Sat Jul 09 19:06:59 2011 +0300
>> >>>>> @@ -709,3 +709,41 @@
>> >>>>> raise error.RequirementError(_("unknown repository
> format: "
>> >>>>> "requires features '%s' (upgrade Mercurial)") % "',
> '".join(missings))
>> >>>>> return requirements
>> >>>>> +
>> >>>>> +class filecache(object):
>> >>>>> + '''A property like decorator that tracks a file under .hg/
> for updates.
>> >>>>> +
>> >>>>> + Records stat info when called in _invalidatecache.
>> >>>>> +
>> >>>>> + On subsequent calls, compares old stat info with new info,
> and recreates
>> >>>>> + the object when needed, updating the new stat info in
> _invalidatecache.'''
>> >>>>> + def __init__(self, path, instore=False):
>> >>>>> + self.path = path
>> >>>>> + self.instore = instore
>> >>>>> +
>> >>>>> + def __call__(self, func):
>> >>>>> + self.func = func
>> >>>>> + self.name <http://self.name> = func.__name__
>> >>>>> + return self
>> >>>>> +
>> >>>>> + def __get__(self, obj, type=None):
>> >>>>> + path = self.instore and obj.sjoin(self.path) or
> obj.join(self.path)
>> >>>>> +
>> >>>>> + if self.name <http://self.name> in obj._invalidatecache:
>> >>>>> + cacheentry = obj._invalidatecache[self.name
> <http://self.name>]
>> >>>>> + stat = util.stat(path)
>> >>>>> +
>> >>>>> + if stat != cacheentry[1]:
>> >>>>> + cacheentry[1] = stat
>> >>>>> + result = cacheentry[0] = self.func(obj)
>> >>>>> + else:
>> >>>>> + result = cacheentry[0]
>> >>>>> + else:
>> >>>>> + # stat -before- reading so our cache doesn't lie if
> someone
>> >>>>> + # modifies between the time we read+stat it
>> >>>>> + stat = util.stat(path)
>> >>>>> + result = self.func(obj)
>> >>>>> + obj._invalidatecache[self.name <http://self.name>]
> = [result, stat, path]
>> >>>>> +
>> >>>>> + setattr(obj, self.name <http://self.name>, result)
>> >>>>> + return result
>> >>>>
>> >>>> What happens if the file changed its contents without changing
> mtime nor
>> >>>> size?
>> >>>
>> >>> Excellent question. Answer: we lose.
>> >>
>> >> ..
>> >>
>> >>> We need to cache and compare -the whole stat result-. There's
> absolutely
>> >>> no reason not to here.
>> >>
>> >> How does that solve the problem of missing a file change that changes
>> >> file contents without changing size nor mtime? (and thus failing to
> call
>> >> func again)
>> >
>> > We've got three buckets we can dump filesystems into:
>> >
>> > have subsecond timestamps (eg NTFS, Btrfs, ext4..):
>> > changes are detected by comparing timestamps
>> > have inodes (ext3, HFS+):
>> > changes made by non-append operations are made atomic rename
>> > and result in timestamp changes
>> > neither (eg VFAT):
>> > similar issues (and solutions) to dirstate apply
>>
>> And good luck with Windows shares.
>>
>> Anyway, I don't think this will work. This code is trying to be too
> clever.
>>
>> And discovering all the cases where it fails will be very hard.
>
> The current plan is to trust the cache when we have inode info or
> subsecond precision.
>
> If it doesn't we will (for now) always reread the file. So filesystems
> that don't have that
> information, won't gain anything for now.
Ok. That sounds better.
On Windows, we might also use FileIndex
http://msdn.microsoft.com/en-us/library/aa363788(v=vs.85).aspx
http://hg.intevation.org/mercurial/crew/file/647071c6dfcf/mercurial/win32.py#l44
which resembles inode's. If the FileIndex has changed, we can infer that
the file has changed.
Would be nice if we could eventually use subseconds inside .hg/dirstate
as well (or in whatever file that higher resolution info would be saved).
> Later on we can optimize it like Matt explained yesterday on IRC, if we
> also add the time
> the file was read to the equation.
Uh. More complicated tricks...
More information about the Mercurial-devel
mailing list