OS X FSEvents and temporary files

Sun Jun 13 16:38:52 CDT 2010

On Jun 13, 2010, at 10:40 PM, Matt Mackall wrote:

> I talked with Martin a bit about the FSEvents problem you're having with
> MacHG and took a closer look at what Mercurial's doing and how Apple's
> API works.

Thanks!

> In summary, it looks like this:
> 
> 1. MacHG gets notified of a change in repo/
> 2. MacHG asks Mercurial for status info on repo/
> 3. Mercurial scans repo/ and encounters a file with the x bit changed
> 4. Mercurial attempts to detect whether this is a real change by
> detecting whether the filesystem that repo/ is on actually supports the
> x bit by creating a temp file and changing its mode
> 5. MacHG gets notified of another change in repo/
> 6. MacHG calls status

I am not sure about steps 3 and 4 of course since they are internal to Mercurial but the other steps are indeed exactly what happens and we start racing...

> So there's a couple salient facts here:
> 
> a) the FSEvents API only reports changes at a directory granularity
> b) it also aggregates events in directory trees
> c) if directory events are ignored, detection of real changes may be
> delayed indefinitely

Directory events are the only kind of events ever reported. So you are saying if we turn off notifications then... well... we won't be notified. But this seems too much like a tautology so you are probably saying something more, but I am not sure what...

> d) Mercurial needs these tests to deal with non-Unix-like filesystems,
> which may be present on Macs

Ahhhh.... which type of systems?? If I knew this then I could check the entirety of the repository when MacHg loads the repository and then we wouldn't need to do this checking every single time Mercurial is called (Mercurial is called lots and lots of times from MacHg, in fact whenever there are changes ;) ). Thus MacHg (or other clients) could be responsible for doing the checexec and checklink check's which Mercurial is now doing. In fact it would be nice to know this so that MacHg would be able to report a nice warning / error message to users. (of course this needs to be a switch with default behavior the way things currently work...)

> e) it appears the temp files only get created/destroyed when there are
> files with the exec or symlink bits changed

> One proposed fix is to make the test files in .hg/, but that will get us
> in trouble if someone decides to use a symlink for .hg (not a terribly
> unreasonable thing to do).

Yep. MacHg could also check that .hg is not a symlink and report an error / warning as well...

> I propose to instead fix it by inserting these steps:
> 
> 1a: MacHG immediately grabs a directory listing of repo/
> 4b: If there are pending events on repo/ upon return from calling
> Mercurial,

events get delivered asynchronously and sometimes up to 2 or 3 seconds after the changes have taken place. (you can set this parameter but normally there is at least some delay...) MacHg gets a lot of its speed by doing things in a threaded asynchronous manner. (You can try flushing the events, but sometimes the flush will be done mid status check, etc.)

Thus sometimes several status requests are done asynchronously and its hard to know which directory is paired with which result.

> 4c: If it has the kFSEventStreamEventFlagMustScanSubDirs flag set, go to
> step 2 - a genuine file change appeared during status

This flag very very rarely comes up. It happens when for instance the processor loads are maxed out ans somehow the events are missed. Its a very exceptional case... So maybe you meant something else?

> 4d: Else, grab a directory listing of repo/ and compare it against the
> one from 1a, if there are any changes, go to step 2

By comparing you mean to compare the status of all the top level files and directories. Where the status is their sizes, modification dates, permissions, and any other meta-data associated with the file right? We would do this in order to detect if the file "changed" right?  That is basically you are suggesting to do a top level walk of the directory looking for changes manually right?

Buttt.... say something changes in a low level directory but a status is also done at an upper level in an asynchronous way, then the directories will be coalesced by FSEvents, and then the FSEvents monitor will pass back "something changed" inside the whole tree. And thus, the top level manual walk of the directory wouldn't pick up this change at the lower levels.

If we had to walk all of the files in the whole repository then we are basically doing the whole job of the FSEvents monitor and moreover it wouldn't be at all fast. It would be far too slow and thats why I am using FSEvents monitor in the first place. (of course we would still have FSEvents monitor telling us that something changed in the first place...)

I tried such tricks as you mentioned in steps 4 and in step 1. I tried a large variety of them. They all failed in various aspects. I devoted a significant chunk of code to it. I can point to the details of where this happened in the Cocoa code if you are really interested in the nitty gritty details :)

I have to say the usability of MacHg increased markedly from a user perspective once I found out I could turn off checexec and checklink. (Well I hadn't released it at that time but I was using it to of course develop itself). The discovery was fantastic. This issue was likely the central problem I have had while developing MacHg. Ie there are other more complicated things but this one troubled me for the longest amount of time and was the most problematic, until finally I found I could shut it off with two simple lines of code. After that change MacHg was much more reliable and various files didn't slip through the net. There are still fringe cases lurking when in asynchronous ways if you madly click and start switching around repositories and doing random things and loading things up I have seen the occasional status glitch, but more or less the transient status problems which were plaguing me, are no longer present at all. Fantastic.

> By confirming that the listings in 1a and 4d match, we can prove that
> nothing has slipped by us in the affected directory while Mercurial was
> doing its thing. This test should be pretty cheap as all the data will
> generally be cache hot, moderately sized, and recursion isn't necessary.
> 
> Also note that it's possible to add:
> 
> 1b: If we have a cached copy, compare it. If it's unchanged, we're done
> - no need to call hg.
> 
> ..which might be nice for dealing with temp files created by editors,
> compilers, etc.

So, thanks for looking at this problem which is a real sticking point!

However it would be really nice if as a client I knew what I was looking for in this checkexec and checklink calls, and MacHg could really easily scan the repo once for the problematic bits you are looking for in the first place and then somehow set some environment variable in passing through to Mercurial saying that MacHg has handled this problem, and not to create temporary files Eg GUICLIENTDONTCHECKEXECORLINK = 1, but a better name :)

I could of course right now just traverse every single directory in the repository (not following symlinks) and make sure checkexec and checklink pass in every single directory of the repository. Thus the repository would be able to work with MacHg, and MacHg would issue some meaningful error message if this wasn't the case like "The repository FooBar located on the file system BugSplatter cannot work with MacHg because BugSplatter is of type Blargh... Please move FooBar to a file system of type MOO..." sort of thing. But likely if I know exactly which conditions we are looking for and why I could fine tune this message a great deal...

Cheers & Thanks,
  Jas