statdaemon extension

Wed Aug 22 06:32:43 CDT 2012

Nicolas Dumazet <nicdumz at gmail.com> writes:

> 2012/8/22 Martin Geisler <mg at aragost.com>:
>>
>> What is not allowed to happen is that the daemon returns a cache
>> mentioning only
>>
>>   ['bbb', 'xxx']
>>
>> when whatever process is modifying the working copy did
>>
>>   write('aaa')
>>   write('bbb')
>>   write('xxx')
>>   write('yyy')
>>
>> The snapshot with only ['bbb', 'xxx'] is inconsistent, but any snapshot
>> with a "prefix" of ['aaa', 'bbb', 'xxx', 'yyy'] is okay.
>>
>> This has some connection with the idea of "serializability" -- that the
>> final result will look *as if* the overlapping operations were executed
>> serially. Here a cache with ['aaa', 'bbb'] would be okay, since it would
>> be the result you would expect if the serial order had been:
>>
>>   write('aaa')
>>   write('bbb')
>>
>>   hg status
>>
>>   write('xxx')
>>   write('yyy')
>
> Sure, there is some "eventual consistency": if the user calls "hg
> status" N times with N big enough, the Nth call will reflect the exact
> state of the system.

Yes, but it's a bit stricter than that -- the view will be correct when
the daemon has processed the events and thus gotten a correct view.
Until then the view returned by the daemon will lack the "real" view.

> This, however, seems broken from an user perspective.
>
> Taking again your example, and adding numbers:
> 1. write('aaa')
> 2. write('bbb')
> 3. write('xxx')
> 4. write('yyy')
>
> If an user does those 4 changes, and issues a `hg status` after 4.,
> she will expect ['aaa', 'bbb', 'xxx', 'yyy'] as a result. Anything
> else is arguably okay from a system design perspective, but wrong for
> the user.

Yes, I'll agree that this is a reasonable assumption. But it relies on a
clear definition of what "after" means -- there needs to be some causal
connection between the writing of the files and Mercurial.

The daemon takes the view that "after" means "the OS has sent an event
for the write (and we've handled that event)". Normal Mercurial will use
"osutils.listdir sees the change" as the definition of "after".

The two definitions are not identical and I would definitely expect that
osutils.listdir can see a change a bit before the event is sent to the
daemon.

> Is the current implementation guaranteeing this?
>
> As far as I understand it, your daemon can return anything in [ [],
> ['aaa'], ['aaa', 'bbb'], ['aaa', 'bbb', 'xxx'], ['aaa', 'bbb', 'xxx',
> 'yyy'] ]. Which means that in turn, hg status can return any of those
> 5 replies.

Right.

> I don't think that there is a bullet-proof way to always have complete
> answers.
> 1) You need to guarantee that the request handler waits for FS events
> to be propagated & handled by the FS events handler
> 2) You need to sync the two threads

I'm not even sure there is a clear definition of what "complete answers"
mean :) If it means that 'hg status' should show all four files after
'sync' is called, then I think the daemon is close to ensuring this:
what you see depends on the speed at which we can process the events.

> And I think that you should not care about the last case:
> 3) When the request handler receives a request at t0, more events
> might be coming (t1) by the time you eventually reply (t2)

Agreed -- by sending back what we've seen at time t0, we make it appear
that 'hg status' executed all the listdir calls extremely fast at t0 and
then sat around and waited doing nothing until t2.

> Maybe better than the current implementation:
>
> safety_margin_msecs = ...
> lock_ = ...
> cache_ = ...
>
> last_poll_ = 0
>
> def fsevents_thread():
>   while True:
>     with lock_:
>       data = poll(timeout=safety_margin_msecs)
>       last_poll_ = now()
>       cache_.update(data)
>
> def handle_request(request):
>   wait(now() - last_poll_ + 2 * safety_margin_msecs)
>   with lock_:
>     cache_.reply(request)
>
> In other words, I suggest adding explicit throttling before answering
> requests, trading speed for correctness.

If I understand this correctly, you're suggesting to wait a bit when a
request comes in, in order to be sure that we've processed the events
that were sent in the safety margin milliseconds before receiving the
request?

> Waiting is _never_ the perfect way to handle synchronization issues,
> but given that we don't have access to OS internals, that's currently
> the best thing I can think of.

I think you're implementing a system that say "okay, we saw your
fetchall() request at t0, let's wait a bit to be sure we've processed
all events before t0". I agree completely that "waiting a bit" cannot be
guaranteed to be correct here -- we don't know exactly when the OS will
have sent all the events before t0.

With that description, the current implementation says "okay, we saw
your fetchall() request at t1, here's a consistent snapshot at some t0
before t1 -- please just pretend that you issued fetchall() at t0".

> On Linux you could do something like ioctl(inotify_fd, FIONREAD,
> &bufsize); to tell if there are any FS events available for processing
> before replying to the client request, but that won't work under
> Windows or Mac.

Windows has an async API too, but it works differently than how I
understand inotify -- instead of letting you read the events one by one,
the async API also returns a buffer full of events.

-- 
Martin Geisler

aragost Trifork
Commercial Mercurial support
http://aragost.com/mercurial/