cmdserver protocol questions

Fri Jul 1 05:02:58 CDT 2011

Idan Kamara <idankk86 at gmail.com> writes:

Hi Idan,

Let me start by saying that I love that we now have this cmdserver
component. I think it will be quite important in the future, especially
when we get it hooked up to a TCP port so that people can use it
remotely. Is making the cmdserver listen on a port part of your plan?

> 2011/6/30 Martin Geisler <mg at aragost.com>
>>
>> Hi guys,
>>
>> Jan is writing a Java library for the cmdserver and he had a number
>> of questions for me about the protocol. I've tried to sum them up
>> below:
>>
>> * Right now, you must always run the cmdserver in an existing
>> repository even if you want to run a commands.norepo command like 'hg
>> init'.
>>
>>  Perhaps we should have both 'hg serve' that does what it used to do
>>  and a new 'hg cmdserver' command -- the latter should be in
>>  commands.optionalrepo whereas the first cannot be there.
>
> I'm not sure it's worth the extra command.

Without a norepo command the interface becomes non-uniform. Our current
thinking is that we'll create an empty dummy repository in order to run
commands like

  hg clone http://somewhere/remote local

It would be much better if we could just launch 'hg cmdserver' and that
would then be a repository-less command server.

>> * Parsing of server's hello block: it would have been nicer if this
>> had been a standard format such as JSON -- now everyone has to
>> implement the parsing by themselves.
>
> I don't know if JSON or anything else is really needed here. The
> format is pretty simple, and I don't expect it to really go through
> major changes in the future either.
>
>>  This is of course not difficult in itself, but now implementors must
>>  decide if whitespace should be trimmed from each line?
>
> What whitespace?

Yes, that is exactly the point. What if there is whitespace here -- do
we ignore it? Can there be whitespace before the ':' and is that then
part of the field name?

With a "proper" parser we could feed the block to it and let it handle
this. If it can parse it then fine, if not, then we abort. Here we'll
have to decide what it means to be able to parse the block.

>> It is also not clearly specified what a "field" is -- is it something
>> that matches "[a-z]+:" or can there be other characters in a field?
>> The safe choice must be to split (once!) on ':', but will all
>> implementations do this?
>
> I guess that can be the field name pattern, with a space after the ':'
> (maybe with digits too).
>
> But for clients I think the best approach would be to parse the hello
> message according to the protocol docs, and ignore any fields they
> don't know.

Yes, that is a nice high-level explanation. But notice how you write
'guess', 'maybe', 'think' in the above.

>> * Input/line channels: what is the precise difference between the two
>> and why do we need both?
>
> When a client sees 'I' it should return raw data (up to length), and
> when it sees 'L' it should return a single line of data (also up to
> length). They're identical in behavior to Python's read/readline on
> file-like objects.

You write "raw data" for the 'I' channel, but I guess the data on the
'L' channel is just as raw?

> The 'L' channel lets the server to ask the client for a prompt reply,
> it's used in places such as ui.prompt(), 'for line in ui.fin', etc.

Should the line contain a final '\n' or a final '\r\n'? Or does it not
matter since the server will run str.strip on the line anyway?

The doc says that the line should be "trimmed to length". Both Java and
.NET happen to call their str.strip equivalent trim", but you mean that
one can send up to length bytes, right?

> Also, using the 'I' channel alone would introduce some complexity on
> the server side, it'll need to split on new lines and buffer the
> remainder for subsequent reads.

I guess the client libraries will have to do the buffering now.

>>  It seems to me that the 'I' channel would be enough, also for
>>  reading patches from stdin. Since the 'I' channel is not
>>  line-oriented, it does not have the 4096 byte line length maximum.
>
> The 4096 length is just a way for the server to limit the amount of
> data the client sends in one go, giving it a chance to read from the
> server to avoid a deadlock (see the protocol RFC discussion where Matt
> explained why this is needed). So it's not specific to the line
> channel, there can be lines with size >4096

Yeah, I remember this discussion. I had expected that the server could
use the 'I' channel to request 4096 bytes at a time from the client.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/