RFC: Command server protocol

Mon Jun 13 10:29:53 CDT 2011

On Mon, Jun 13, 2011 at 10:15 AM, Idan Kamara <idankk86 at gmail.com> wrote:
> On Mon, Jun 13, 2011 at 1:50 AM, Matt Mackall <mpm at selenic.com> wrote:
>> On Sun, 2011-06-12 at 19:15 +0300, Idan Kamara wrote:
>>> Here's an overview of the current protocol used by the command server (also
>>> available here http://mercurial.selenic.com/wiki/CommandServer). Feedback is
>>> appreciated.
>>>
>>> All communication with the server is done on stdin/stdout. The byte order
>>> used by the server is big-endian.
>>
>> When is big-endian used?
>
> In the channel header. And all other length fields which are unsigned ints.
>
>>
>>> Data sent from the server is channel based, meaning a (channel [character],
>>> length [unsigned int]) pair is sent before the actual data. For example:
>>>
>>> o
>>> 1234
>>
>> Is this '1234' in text or in binary? If it's binary, how many bytes is
>> it?
>
> It's binary, 4 bytes according to
> http://docs.python.org/library/struct.html#format-characters

Is there any reason not to do something like chunked transfer encoding
so that we don't have to know body lengths up front?

>
>>
>>> <data: 1234 bytes>
>>>
>>> that is 1234 bytes sent on channel 'o', with the data following.
>>>
>>> When starting the server, it will send a new-line separated list of
>>> capabilities (on the 'o' channel), in this format:
>>>
>>> capabilities:\n
>>> capability1\n
>>> capability2\n
>>> ...
>>
>> There should probably be a blank line or something indicating that
>> there's no more data arriving?
>
> It's one string with all the capabilities being sent on the output channel.
> So the client sees this as one chunk.

Unless we're speaking this protocol over some pathological TCP link
with a really small MTU. It'd be nice to have some sort of sentinel,
as it makes the protocol a ton easier to work with.

>
>>
>>> At the most basic level, the server will support the 'runcommand'
>>> capability.
>>>
>>> Encoding
>>> -------------
>>> Strings are encoded in the local encoding.
>>
>> How does the client discover the encoding of the server? The answer at
>> this point may be 'the client has to set it with HGENCODING if it cares'
>> but this should be in the doc.
>>
>
> Updated, thanks.
>
>>> Channels
>>> --------------
>>> There are currently 5 channels:
>>>
>>> * o - Output channel. Most of the communication happens on this channel.
>>> When running commands, output Mercurial writes to stdout is written to this
>>> channel.
>>> * e - Error channel. When running commands, this correlates to stderr.
>>> * i - Input channel. The length field here can either be 0, telling the
>>> client to send all input, or some positive number telling the client to send
>>> at most <length> bytes.
>>> * l - Line based input channel. The client should send a single line of
>>> input (trimmed if length is not 0). This channel is used when Mercurial
>>> interacts with the user or when iterating over stdin.
>>
>> What should a client do with unexpected channel responses?
>>
>> For instance, what happens when a progress channel is added? What
>> happens if a client gets an unexpected prompt?
>
> Since progress is considered output, it needs to consume it and ignore it
> if it's of no interest to him.
>
> If the server is asking the client for input, it must feed something
> to it or the
> server will hang waiting. I'm not sure what you mean by unexpected though.

If we introduce a new channel, we should have a way to mark the
channel as either required or optional, so that old clients don't end
up causing the server to hang.

>
>>
>>> Input should be sent on stdin in the following format:
>>>
>>> length
>>> data
>>
>> The input model is interesting: it basically has the server prompting
>> the client for input. That probably makes sense, but we should probably
>> be explicit about what's required to avoid deadlock.
>>
>> For instance, if the server is both consuming input and producing
>> output, and the client is simply spooling input (ie a big patch), it
>> will eventually write enough data to the client that its write blocks.
>>
>
> Right. But technically if the server writes output while asking for input,
> for the client to know it needs to send more input, it will have to
> read the output first.
>
>>> length = 0 sent by the client is interpreted as EOF by the server.
>>>
>>> * d - Debug channel. Used when the server is started with logging to '-'.
>>>
>>> Capabilities
>>
>> This may not be the right section name?
>
> It explains what the server can do (per its capabilities). Maybe
> commands is better?
>
>>
>>> -----------------
>>> The server is running on an endless loop (until stdin is closed) waiting for
>>> commands. A command request looks like this:
>>>
>>> commandname\n
>>> <command specific request>
>>>
>>> * runcommand - Run the command specified by a list of \0-terminated strings.
>>> An unsigned int indicating the length of the arguments should be sent before
>>> the list. Example:
>>>
>>> runcommand\n
>>> 8
>>> log\0
>>> -l\0
>>> 5
>>>
>>> Which corresponds to running 'hg log -l 5'.
>>
>> The wiki page has a piece about error codes but it's not quite clear how
>> a client distinguishes those from the output stream.
>
> Yeah. This is a problem if the server sends a \0 as part of its 'regular'
> output. The client will be misled thinking it's the end.
>
> Maybe we could use another channel here ('a'dmin?) for the server
> to tell the client that a command finished and to send its return code.
>
>> I'd like to see an
>> entire example stream from connect to disconnect. Something like:
>>
>>   client               server                        commentary
>>   <connect>
>>                        hello, my capabilities are:   wait for blank line
>>   hg sum, please                                     a valid command
>>                        <output>
>>                        <error code>
>>   hg smurf, please
>>                        <output>
>>                        <error stream>
>>                        <error code>
>>   <close>
>>                        <shutdown>
>>
>> ..except with literal bytes and commentary.
>
> I updated the wiki with 2 examples, see
> http://mercurial.selenic.com/wiki/CommandServer#Examples
> Unfortunately, after a lot of messing around I wasn't able to properly
> inline it here.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>