RFC: Command server protocol

Mon Jun 13 16:41:55 CDT 2011

On Tue, 2011-06-14 at 00:10 +0300, Idan Kamara wrote:
> On Mon, Jun 13, 2011 at 11:36 PM, Matt Mackall <mpm at selenic.com> wrote:
> > On Mon, 2011-06-13 at 23:05 +0300, Idan Kamara wrote:
> >> On Mon, Jun 13, 2011 at 7:03 PM, Matt Mackall <mpm at selenic.com> wrote:
> >> > On Mon, 2011-06-13 at 18:15 +0300, Idan Kamara wrote:
> >> >> On Mon, Jun 13, 2011 at 1:50 AM, Matt Mackall <mpm at selenic.com> wrote:
> >> >> > On Sun, 2011-06-12 at 19:15 +0300, Idan Kamara wrote:
> >> >> >> Here's an overview of the current protocol used by the command server (also
> >> >> >> available here http://mercurial.selenic.com/wiki/CommandServer). Feedback is
> >> >> >> appreciated.
> >> >> >>
> >> >> >> All communication with the server is done on stdin/stdout. The byte order
> >> >> >> used by the server is big-endian.
> >> >> >
> >> >> > When is big-endian used?
> >> >>
> >> >> In the channel header. And all other length fields which are unsigned ints.
> >> >
> >> >> >
> >> >> >> Data sent from the server is channel based, meaning a (channel [character],
> >> >> >> length [unsigned int]) pair is sent before the actual data. For example:
> >> >> >>
> >> >> >> o
> >> >> >> 1234
> >> >> >
> >> >> > Is this '1234' in text or in binary? If it's binary, how many bytes is
> >> >> > it?
> >> >>
> >> >> It's binary, 4 bytes according to
> >> >> http://docs.python.org/library/struct.html#format-characters
> >> >
> >> > Did you clarify this on the wiki?
> >>
> >> Sort of: Data sent from the server is channel based, meaning a
> >> (channel [character], length [unsigned int]) pair is sent before the
> >> actual data.
> >>
> >> I'll make sure it's more clear by linking to the python docs.
> >>
> >> >
> >> >> >
> >> >> >> <data: 1234 bytes>
> >> >> >>
> >> >> >> that is 1234 bytes sent on channel 'o', with the data following.
> >> >> >>
> >> >> >> When starting the server, it will send a new-line separated list of
> >> >> >> capabilities (on the 'o' channel), in this format:
> >> >> >>
> >> >> >> capabilities:\n
> >> >> >> capability1\n
> >> >> >> capability2\n
> >> >> >> ...
> >> >> >
> >> >> > There should probably be a blank line or something indicating that
> >> >> > there's no more data arriving?
> >> >>
> >> >> It's one string with all the capabilities being sent on the output channel.
> >> >> So the client sees this as one chunk.
> >> >
> >> > Ok.
> >> >
> >> >> >> Channels
> >> >> >> --------------
> >> >> >> There are currently 5 channels:
> >> >> >>
> >> >> >> * o - Output channel. Most of the communication happens on this channel.
> >> >> >> When running commands, output Mercurial writes to stdout is written to this
> >> >> >> channel.
> >> >> >> * e - Error channel. When running commands, this correlates to stderr.
> >> >> >> * i - Input channel. The length field here can either be 0, telling the
> >> >> >> client to send all input, or some positive number telling the client to send
> >> >> >> at most <length> bytes.
> >> >> >> * l - Line based input channel. The client should send a single line of
> >> >> >> input (trimmed if length is not 0). This channel is used when Mercurial
> >> >> >> interacts with the user or when iterating over stdin.
> >> >> >
> >> >> > What should a client do with unexpected channel responses?
> >> >> >
> >> >> > For instance, what happens when a progress channel is added? What
> >> >> > happens if a client gets an unexpected prompt?
> >> >>
> >> >> Since progress is considered output, it needs to consume it and ignore it
> >> >> if it's of no interest to him.
> >> >
> >> > If a client written today encounters a progress channel tomorrow, how
> >> > does it know not to abort? It wasn't written to expect that.
> >>
> >> The client can choose what to do when he gets data on an unexpected channel.
> >> Unless we mess up with the initial design, I don't see why ignoring
> >> unexpected data shouldn't be fine.
> >> (by ignoring I mean just reading the data and doing nothing with it)
> >
> > Let's say a client library calls a command that wants input, but it
> > didn't expect it to. Simply discarding the input request from the server
> > and waiting for the command to complete won't work, as the command will
> > never complete. So the above is clearly not always correct. Some
> > channels are already known to be non-ignorable.
> 
> Yes, what I meant was there are currently 2 non ignorable channels,
> those are 'i' and 'l'.
> Unless we introduce more non-ignorable channels in the future, the
> client is free to ignore data from unknown channels.

When combined with a 'we will not break old clients' promise, that
strategy will make introducing new non-ignorable channels impossible.

> Maybe Augie's suggestion can save us some headaches though:
> if we somehow identify required channels, clients will have a way of seeing
> an unknown channel and deciding to abort if its required, otherwise
> it's safe to ignore the data.

I was hoping you would arrive at this conclusion.

-- 
Mathematics is the supreme nostalgia of our time.