[RFC] inotify on MacOS: FSEvents usage

Nicolas Dumazet nicdumz at gmail.com
Thu Jul 16 01:44:31 CDT 2009


Hello!

I am trying to design the low-level part of the MacOS port of inotify
these days.

== General actual structure ==

We only support File system notifications on Linux for now. A inotify
"server" is launched, running as a daemon, and listening for file
changes in the repo. hg clients contact the server throught
.hg/inotify.sock to request repo.status() data.

The kernel inotify module returns you a file descriptor when you
register a path to watch, and writes events on that descriptor.
Which means that the current implementation is a poll() loop:

poll.register(inotifyfd)
poll.register(sock)

while True:
    events = poll.poll(timeout)
    handle(events)

Polling is quite important because it means that we can efficiently
sleep and get woken up on events from either sources: no inefficient
"wait on first source, wait on second source", and the second source
is not ignored if the first source always fires events; both are
treated.

== MacOS differences ==

The Apple APIs do not provide any way to listen on a file descriptor
for file events. It's easy to listen on File Descriptors (sockets or
others), but there is no user-available descriptors specific for
FSEvents.

You are asked to register the path you have to watch, and a callback.
You then lauch a "RunLoop", which, while it's running, fires your
callback on events.

=== First try ===
My first idea to deal with this was simple:

My callback could simply be a function writing to a socket fsevents
data. This way, we could have a file descriptor, and the Python code
would simply have to poll() on two socket file descriptors.
This, however, requires inotify to have two threads: one Writer, for
"RunLoop"-ing, writing to the socket - and the other thread, the
Handler, handling client requests and maintaining an image of the
repo.
Having two threads seemed quite prohibitive to me: in particular, what
if the Handler starts handling a status() request from a hg client,
while Writer is waiting for scheduler's attention to deliver his
events? That solution is simple to implement, but it seems that it
would introduce quite a few race conditions.

=== Second attempt ===

I then thought that trying to offer a file descriptor for FSEvents, to
poll() on it, could be a wrong approach.
My next idea was then to try to write a C extension that would offer a
poll()-like interface that could return both file system events, and
file descriptor events.
This led me to write "pyfsevents" (
http://bitbucket.org/nicdumz/fsevents/src/ ) a C extension to offer
such functionalities.

In short, it has the usual select.poll primitives:
    * register(fd)
    * unregister(fd)
    * poll([timeout])

And adds two fsevents registrars:
    * registerpath(abspath)
    * unregisterpath(abspath)

It looked like the interface I needed to implement inotify support.

But the underlying "RunLoop" structure that I have to use would not
let me do this, on a similar fashion as poll() does:
For Linux's poll, this serie of events:
    1) poll.register() fd1, fd2
    2) events happen on fd1 and fd2
    3) poll.poll()
will mark both file descriptors as changed. And as I wrote it above,
it is important for inotify's behavior.

On MacOS, the events:
    1) register file descriptor
    2) register path for fsevents
    3) events happen on both sources
    either:
    4a) RunLoopStart(stopOnFirstEvent=True)
    or
    4b) RunLoopStart(stopOnFirstEvent=False)

4b) reports all the sources, but will be "blocking", because the
thread will never stop listening for events.

4a) reports only one of the sources. You can repeat the call, but how
many times? You don't know the number of events: if you call less
times, you miss events, if you call one time too much, the last call
will block.


With this behavior, it seems quite difficult to report consecutive
events in a select.poll-way. Which seems to invalidate my generic
"pyfsevents" approach.

=== What's next ===

I am thinking that I should implement the C callbacks so that they
call a Python callback on events. I was reluctant to do it, but
realized today that I have no valid reasons not to consider seriously
this option

I would define in Python callback interfaces to respect:

class callback_interface(object): pass

class fsevents_callback(callback_interface):
    def callback(self, path, recursive_event):
        raise NotImplementedError

class filedescriptor_callback(callback_interface):
    def callback(self, fd, mask):
        raise NotImplementedError

And then implement in C the primitives:

* registerfd(fd, callback)
* registerpath(path, callback)

* startlistening()

startlistening() would be blocking, and typically the last call of
inotify server, after initialization.
If needed, I could expose a stop() routine, which would have to be
called from inside the callbacks. (I don't think that it's necessary
for our daemon.)



How does this last plan sound? Can you think of anything better? Any
possible problems that I haven't talked of?

Thanks,
-- 
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]



More information about the Mercurial-devel mailing list