OS X FSEvents and temporary files

Jason Harris jason.f.harris at gmail.com
Sun Jun 13 18:44:26 CDT 2010


On Jun 14, 2010, at 12:14 AM, Matt Mackall wrote:

> On Sun, 2010-06-13 at 23:38 +0200, Jason Harris wrote:
>> On Jun 13, 2010, at 10:40 PM, Matt Mackall wrote:
>> 
>>> I talked with Martin a bit about the FSEvents problem you're having with
>>> MacHG and took a closer look at what Mercurial's doing and how Apple's
>>> API works.
>> 
>> Thanks!
>> 
>>> In summary, it looks like this:
>>> 
>>> 1. MacHG gets notified of a change in repo/
>>> 2. MacHG asks Mercurial for status info on repo/
>>> 3. Mercurial scans repo/ and encounters a file with the x bit changed
>>> 4. Mercurial attempts to detect whether this is a real change by
>>> detecting whether the filesystem that repo/ is on actually supports the
>>> x bit by creating a temp file and changing its mode
>>> 5. MacHG gets notified of another change in repo/
>>> 6. MacHG calls status
>> 
>> I am not sure about steps 3 and 4 of course since they are internal to Mercurial but the other steps are indeed exactly what happens and we start racing...
>> 
>> 
>>> So there's a couple salient facts here:
>>> 
>>> a) the FSEvents API only reports changes at a directory granularity
>>> b) it also aggregates events in directory trees
>>> c) if directory events are ignored, detection of real changes may be
>>> delayed indefinitely
>> 
>> Directory events are the only kind of events ever reported. So you are
>> saying if we turn off notifications then... well... we won't be
>> notified. But this seems too much like a tautology so you are probably
>> saying something more, but I am not sure what...
> 
> Some of this is for other folks following along. But my point is that we
> can't disregard any events without inspecting them.

Yep.


>>> d) Mercurial needs these tests to deal with non-Unix-like filesystems,
>>> which may be present on Macs
>> 
>> Ahhhh.... which type of systems??
> 
> Anything that's not a local Unix filesystem with normal exec bits and
> symlinks. I forget how/if checkcase gets involved here but it may.

Anyone know a way to determine this? By googling I got down to the mount point of a given path:

df $path | tail -1 | awk '{ print $6 }'

I can get the information about the volume here:

http://developer.apple.com/mac/library/documentation/Carbon/Reference/File_Manager/Reference/reference.html#//apple_ref/c/func/FSGetVolumeInfo

But what exactly am I looking for here. Which parameters here are what I am looking for, or determine this behavior. Is this the field filesystemID in the FSVolumeInfo?

http://developer.apple.com/mac/library/documentation/Carbon/Reference/File_Manager/Reference/reference.html#//apple_ref/doc/c_ref/FSVolumeInfo


>> If I knew this then I could check the entirety of the repository when
>> MacHg loads the repository and then we wouldn't need to do this
>> checking every single time Mercurial is called (Mercurial is called
>> lots and lots of times from MacHg, in fact whenever there are
>> changes ;) ). Thus MacHg (or other clients) could be responsible for
>> doing the checexec and checklink check's which Mercurial is now doing.
>> In fact it would be nice to know this so that MacHg would be able to
>> report a nice warning / error message to users. (of course this needs
>> to be a switch with default behavior the way things currently work...)
>> 
>> 
>>> e) it appears the temp files only get created/destroyed when there are
>>> files with the exec or symlink bits changed
>> 
>> 
>> 
>>> One proposed fix is to make the test files in .hg/, but that will get us
>>> in trouble if someone decides to use a symlink for .hg (not a terribly
>>> unreasonable thing to do).
>> 
>> Yep. MacHg could also check that .hg is not a symlink and report an error / warning as well...
>> 
>>> I propose to instead fix it by inserting these steps:
>>> 
>>> 1a: MacHG immediately grabs a directory listing of repo/
>>> 4b: If there are pending events on repo/ upon return from calling
>>> Mercurial,
>> 
>> events get delivered asynchronously and sometimes up to 2 or 3 seconds
>> after the changes have taken place.
> 
> Really? That's unfortunate. But I think we can still work around it.
> 
>> (you can set this parameter but normally there is at least some
>> delay...) MacHg gets a lot of its speed by doing things in a threaded
>> asynchronous manner. (You can try flushing the events, but sometimes
>> the flush will be done mid status check, etc.)
>> 
>> Thus sometimes several status requests are done asynchronously and its hard to know which directory is paired with which result.
>> 
>>> 4c: If it has the kFSEventStreamEventFlagMustScanSubDirs flag set, go to
>>> step 2 - a genuine file change appeared during status
>> 
>> This flag very very rarely comes up. It happens when for instance the processor loads are maxed out ans somehow the events are missed. Its a very exceptional case... So maybe you meant something else?
>> 
>> 
>>> 4d: Else, grab a directory listing of repo/ and compare it against the
>>> one from 1a, if there are any changes, go to step 2
>> 
>> By comparing you mean to compare the status of all the top level files
>> and directories. Where the status is their sizes, modification dates,
>> permissions, and any other meta-data associated with the file right?
>> We would do this in order to detect if the file "changed" right?  That
>> is basically you are suggesting to do a top level walk of the
>> directory looking for changes manually right?
>> 
>> Buttt.... say something changes in a low level directory but a status
>> is also done at an upper level in an asynchronous way, then the
>> directories will be coalesced by FSEvents, and then the FSEvents
>> monitor will pass back "something changed" inside the whole tree. And
>> thus, the top level manual walk of the directory wouldn't pick up this
>> change at the lower levels.
>> 
>> If we had to walk all of the files in the whole repository then we are
>> basically doing the whole job of the FSEvents monitor and moreover it
>> wouldn't be at all fast. It would be far too slow and thats why I am
>> using FSEvents monitor in the first place. (of course we would still
>> have FSEvents monitor telling us that something changed in the first
>> place...)
>> 
>> I tried such tricks as you mentioned in steps 4 and in step 1. I tried
>> a large variety of them. They all failed in various aspects. I devoted
>> a significant chunk of code to it. I can point to the details of where
>> this happened in the Cocoa code if you are really interested in the
>> nitty gritty details :)
>> 
>> I have to say the usability of MacHg increased markedly from a user
>> perspective once I found out I could turn off checexec and checklink.
>> (Well I hadn't released it at that time but I was using it to of
>> course develop itself). The discovery was fantastic. This issue was
>> likely the central problem I have had while developing MacHg. Ie there
>> are other more complicated things but this one troubled me for the
>> longest amount of time and was the most problematic, until finally I
>> found I could shut it off with two simple lines of code. After that
>> change MacHg was much more reliable and various files didn't slip
>> through the net. There are still fringe cases lurking when in
>> asynchronous ways if you madly click and start switching around
>> repositories and doing random things and loading things up I have seen
>> the occasional status glitch, but more or less the transient status
>> problems which were plaguing me, are no longer present at all.
>> Fantastic.
> 
> The basic observation of my approach is that if you have:
> 
> known before state in directory X
> any number of events in directory X that don't touch subdirectories
> known after state in directory X

But you don't know that the subdirectories where not touched since if they were touched at the same time as the super directory they will get coalesced.

> and before=after, you can ignore all the events between A and B because
> you know nothing changed. 
> 
> A 'normal' FSEvents watcher would have to basically look at the contents
> of each directory in the event queue and compare it to its last known or
> startup state like the above to figure out what files have changed.
> MacHG can mostly get away without tracking that stuff itself because it
> can rely on Mercurial's internal dirstate. But if did a bit of both, it
> could be smarter and faster: for instance, by only calling for status on
> files it knows have changed.

Can't Mercurial track this just as fast as MacHg. Ie MacHg only knows that eg something changed in some directory. (MacHg only calls the status on this directory) So at that time MacHg and Mercurial have the same amount of information. Something changed somewhere inside this directory and possibly in some of the sub directories. Thus isn't it just as fast for Mercurial to determine what changed as MacHg, since both will be making system level calls that walk the tree... I am likely missing something, but if Mercurial can't check just as fast as MacHg then there is room to improve Mercurial :)


>>> By confirming that the listings in 1a and 4d match, we can prove that
>>> nothing has slipped by us in the affected directory while Mercurial was
>>> doing its thing. This test should be pretty cheap as all the data will
>>> generally be cache hot, moderately sized, and recursion isn't necessary.
>>> 
>>> Also note that it's possible to add:
>>> 
>>> 1b: If we have a cached copy, compare it. If it's unchanged, we're done
>>> - no need to call hg.
>>> 
>>> ..which might be nice for dealing with temp files created by editors,
>>> compilers, etc.
>> 
>> So, thanks for looking at this problem which is a real sticking point!
>> 
>> However it would be really nice if as a client I knew what I was
>> looking for in this checkexec and checklink calls, and MacHg could
>> really easily scan the repo once for the problematic bits you are
>> looking for in the first place and then somehow set some environment
>> variable in passing through to Mercurial saying that MacHg has handled
>> this problem, and not to create temporary files Eg
>> GUICLIENTDONTCHECKEXECORLINK = 1, but a better name :)
> 
> We could do something like that, but it's probably getting a bit too
> intimate with the internals.
> 
>> I could of course right now just traverse every single directory in
>> the repository (not following symlinks) and make sure checkexec and
>> checklink pass in every single directory of the repository. Thus the
>> repository would be able to work with MacHg, and MacHg would issue
>> some meaningful error message if this wasn't the case like "The
>> repository FooBar located on the file system BugSplatter cannot work
>> with MacHg because BugSplatter is of type Blargh... Please move FooBar
>> to a file system of type MOO..." sort of thing. But likely if I know
>> exactly which conditions we are looking for and why I could fine tune
>> this message a great deal...
> 
> Not allowing people to use repos on flash drives

Well, I just tried MacHg on a flash drive and it seems to be working on a repo there just fine. Admitadly the Flash drive has a Mac OS Extended (Journaled) file system on it. Is there anything I should be testing for besides seeing that the status is auto-detected when changes to the files occur (ie FSEvents looks like its working) commits work, Update -C works, diffs, etc.

I tried it with an MS-Dos FAT file system and that wasn't working correctly with the modified Mercurial 1.5.4 that MacHg uses. When I switched MacHg to the native stock Mercurial 1.5.4 which causes MacHg to race under the conditions listed above it did appear to work... So I guess Mercurial is degrading nicely when it can't do the checkexec and checklink calls? If MacHg knew how to test this it could pass this information through to Mercurial as well like a GUIREPOSITORYRESIDESONFILESYSTEM = <bitfield> depending on the capabilities of the file system, or in fact it could report more detailed information as long as I knew what to test for, and then Mercurial could see if this environment variable is set and then forgo the checkexec and checklink tests depending on the bitfield flags passed to it...

> and NFS doesn't sound like a win (though I'm pretty sure FSEvents doesn't/can't work with NFS
> anyway - you're gonna have to fall back to polling).


In that case I think its just much much easier to issue a notice to the user. "Sorry MacHg doesn't work well with NFS drives. Please copy the repository to your main hard drive, details..." In practice, I doubt this will be much of a hinderance to anyone.

Of course it would be nice if we could easily work around this... but well the time involved in figuring this out, I would rather eg make drag and drop patch queues or hunk level commit inclusion / exclusion, or annotate on steroids, or any of lots of other extensions which would benefit a much large segment of the user base. Then maybe look at options to handle this when a lot of the low hanging fruit is done.

So if someone could tell me exactly what I need to test for that would be great. Ie should I just walk the directories seeing if I can run checkexec and checklink and if this fails in any of the directories do I just issue the message "Please move the repo Blah to your main hard drive", etc...

Thanks,
  Jas


More information about the Mercurial-devel mailing list