RFC: Command server protocol

Mon Jun 13 10:15:06 CDT 2011

On Mon, Jun 13, 2011 at 1:50 AM, Matt Mackall <mpm at selenic.com> wrote:
> On Sun, 2011-06-12 at 19:15 +0300, Idan Kamara wrote:
>> Here's an overview of the current protocol used by the command server (also
>> available here http://mercurial.selenic.com/wiki/CommandServer). Feedback is
>> appreciated.
>>
>> All communication with the server is done on stdin/stdout. The byte order
>> used by the server is big-endian.
>
> When is big-endian used?

In the channel header. And all other length fields which are unsigned ints.

>
>> Data sent from the server is channel based, meaning a (channel [character],
>> length [unsigned int]) pair is sent before the actual data. For example:
>>
>> o
>> 1234
>
> Is this '1234' in text or in binary? If it's binary, how many bytes is
> it?

It's binary, 4 bytes according to
http://docs.python.org/library/struct.html#format-characters

>
>> <data: 1234 bytes>
>>
>> that is 1234 bytes sent on channel 'o', with the data following.
>>
>> When starting the server, it will send a new-line separated list of
>> capabilities (on the 'o' channel), in this format:
>>
>> capabilities:\n
>> capability1\n
>> capability2\n
>> ...
>
> There should probably be a blank line or something indicating that
> there's no more data arriving?

It's one string with all the capabilities being sent on the output channel.
So the client sees this as one chunk.

>
>> At the most basic level, the server will support the 'runcommand'
>> capability.
>>
>> Encoding
>> -------------
>> Strings are encoded in the local encoding.
>
> How does the client discover the encoding of the server? The answer at
> this point may be 'the client has to set it with HGENCODING if it cares'
> but this should be in the doc.
>

Updated, thanks.

>> Channels
>> --------------
>> There are currently 5 channels:
>>
>> * o - Output channel. Most of the communication happens on this channel.
>> When running commands, output Mercurial writes to stdout is written to this
>> channel.
>> * e - Error channel. When running commands, this correlates to stderr.
>> * i - Input channel. The length field here can either be 0, telling the
>> client to send all input, or some positive number telling the client to send
>> at most <length> bytes.
>> * l - Line based input channel. The client should send a single line of
>> input (trimmed if length is not 0). This channel is used when Mercurial
>> interacts with the user or when iterating over stdin.
>
> What should a client do with unexpected channel responses?
>
> For instance, what happens when a progress channel is added? What
> happens if a client gets an unexpected prompt?

Since progress is considered output, it needs to consume it and ignore it
if it's of no interest to him.

If the server is asking the client for input, it must feed something
to it or the
server will hang waiting. I'm not sure what you mean by unexpected though.

>
>> Input should be sent on stdin in the following format:
>>
>> length
>> data
>
> The input model is interesting: it basically has the server prompting
> the client for input. That probably makes sense, but we should probably
> be explicit about what's required to avoid deadlock.
>
> For instance, if the server is both consuming input and producing
> output, and the client is simply spooling input (ie a big patch), it
> will eventually write enough data to the client that its write blocks.
>

Right. But technically if the server writes output while asking for input,
for the client to know it needs to send more input, it will have to
read the output first.

>> length = 0 sent by the client is interpreted as EOF by the server.
>>
>> * d - Debug channel. Used when the server is started with logging to '-'.
>>
>> Capabilities
>
> This may not be the right section name?

It explains what the server can do (per its capabilities). Maybe
commands is better?

>
>> -----------------
>> The server is running on an endless loop (until stdin is closed) waiting for
>> commands. A command request looks like this:
>>
>> commandname\n
>> <command specific request>
>>
>> * runcommand - Run the command specified by a list of \0-terminated strings.
>> An unsigned int indicating the length of the arguments should be sent before
>> the list. Example:
>>
>> runcommand\n
>> 8
>> log\0
>> -l\0
>> 5
>>
>> Which corresponds to running 'hg log -l 5'.
>
> The wiki page has a piece about error codes but it's not quite clear how
> a client distinguishes those from the output stream.

Yeah. This is a problem if the server sends a \0 as part of its 'regular'
output. The client will be misled thinking it's the end.

Maybe we could use another channel here ('a'dmin?) for the server
to tell the client that a command finished and to send its return code.

> I'd like to see an
> entire example stream from connect to disconnect. Something like:
>
>   client               server                        commentary
>   <connect>
>                        hello, my capabilities are:   wait for blank line
>   hg sum, please                                     a valid command
>                        <output>
>                        <error code>
>   hg smurf, please
>                        <output>
>                        <error stream>
>                        <error code>
>   <close>
>                        <shutdown>
>
> ..except with literal bytes and commentary.

I updated the wiki with 2 examples, see
http://mercurial.selenic.com/wiki/CommandServer#Examples
Unfortunately, after a lot of messing around I wasn't able to properly
inline it here.