cmdserver protocol questions

Fri Jul 1 07:36:12 CDT 2011

On Fri, Jul 1, 2011 at 1:02 PM, Martin Geisler <mg at aragost.com> wrote:
>
> Idan Kamara <idankk86 at gmail.com> writes:
>
> Hi Idan,
>
> Let me start by saying that I love that we now have this cmdserver
> component. I think it will be quite important in the future, especially
> when we get it hooked up to a TCP port so that people can use it
> remotely. Is making the cmdserver listen on a port part of your plan?

Initially it was TCP, but Matt explained the issues involved with a remote
approach:

mpm: I think the place to start is probably over a pipe.
mpm: The upside of this is that the lifetime of the server is very well
managed. We needn't worry about collecting tons of fat servers when clients
misbehave.
mpm: The downside is that it's a 1:1 client:server model. That's suboptimal,
but mostly not a concern.
idank: but doesn't that limit the cmd server to live on the same machine as
the client? not sure if clients will use another setup though
mpm: Once we move to tcp/ip or Unix sockets or whatever, then we need to
start thinking in terms of server startup/shutdown, permissions/security,
visibility to remote machines, authentication, etc.
mpm: idank: I really don't see that we want users across the internet using
GUI clients to execute hg commands on remote machines.
idank: mpm: well not over the internet, but for example bitbucket
mpm: idank: I think you're seriously underestimating the security issues
here.
idank: maybe they have 1 server that hosts all the repositories, with cmd
servers running
mpm: If I were running Bitbucket, I wouldn't entertain this notion for more
than half a second.
mpm: (Much like the symlinks thing, I don't KNOW that there's a hole, but
all my instincts here scream DANGER)

So for the project I'm pretty sure it won't happen.
As for the future of the cmdserver, perhaps Matt can share his vision.

>
> > 2011/6/30 Martin Geisler <mg at aragost.com>
> >>
> >> Hi guys,
> >>
> >> Jan is writing a Java library for the cmdserver and he had a number
> >> of questions for me about the protocol. I've tried to sum them up
> >> below:
> >>
> >> * Right now, you must always run the cmdserver in an existing
> >> repository even if you want to run a commands.norepo command like 'hg
> >> init'.
> >>
> >>  Perhaps we should have both 'hg serve' that does what it used to do
> >>  and a new 'hg cmdserver' command -- the latter should be in
> >>  commands.optionalrepo whereas the first cannot be there.
> >
> > I'm not sure it's worth the extra command.
>
> Without a norepo command the interface becomes non-uniform. Our current
> thinking is that we'll create an empty dummy repository in order to run
> commands like
>
>  hg clone http://somewhere/remote local
>
> It would be much better if we could just launch 'hg cmdserver' and that
> would then be a repository-less command server.

I agree that it's an inconvenience that it doesn't work for norepo commands,
we can probably add that functionality after 1.9.

But I think for now, it's not so bad if you run things like init, clone
using plain
hg, then connect to the result repo with the cmdserver.

>
> >> * Parsing of server's hello block: it would have been nicer if this
> >> had been a standard format such as JSON -- now everyone has to
> >> implement the parsing by themselves.
> >
> > I don't know if JSON or anything else is really needed here. The
> > format is pretty simple, and I don't expect it to really go through
> > major changes in the future either.
> >
> >>  This is of course not difficult in itself, but now implementors must
> >>  decide if whitespace should be trimmed from each line?
> >
> > What whitespace?
>
> Yes, that is exactly the point. What if there is whitespace here -- do
> we ignore it? Can there be whitespace before the ':' and is that then
> part of the field name?

You're right that the protocol doesn't define how fields look on the
hello msg. I think we should define it now.

It can be a line separated list of

<field name>: <field data>

where <field name> is restricted to [a-z0-9]+ and <field data> is field
specific.

>
> With a "proper" parser we could feed the block to it and let it handle
> this. If it can parse it then fine, if not, then we abort. Here we'll
> have to decide what it means to be able to parse the block.
>
> >> It is also not clearly specified what a "field" is -- is it something
> >> that matches "[a-z]+:" or can there be other characters in a field?
> >> The safe choice must be to split (once!) on ':', but will all
> >> implementations do this?
> >
> > I guess that can be the field name pattern, with a space after the ':'
> > (maybe with digits too).
> >
> > But for clients I think the best approach would be to parse the hello
> > message according to the protocol docs, and ignore any fields they
> > don't know.
>
> Yes, that is a nice high-level explanation. But notice how you write
> 'guess', 'maybe', 'think' in the above.
>
> >> * Input/line channels: what is the precise difference between the two
> >> and why do we need both?
> >
> > When a client sees 'I' it should return raw data (up to length), and
> > when it sees 'L' it should return a single line of data (also up to
> > length). They're identical in behavior to Python's read/readline on
> > file-like objects.
>
> You write "raw data" for the 'I' channel, but I guess the data on the
> 'L' channel is just as raw?
>

Well yeah. The main difference is when the server uses each one.

Without the 'L' channel, interactive commands will be difficult on the
client.
Because the decision of when the server needs one line moves to the client,
which is a lot harder for him and depends on the context (that the server
already has).

Imagine a client that runs "hg record", where the server output goes to
stdout
and input is read from stdin.

<server sends client first diff>
<server asks client for 4096 bytes at most (to decide if to accept the diff
or not)>

At this point the client needs to figure out that all the server really
needs is one line,
which will cause it to emit the next diff. But without this hindsight the
client will try to
consume 4096 bytes from stdin, which effectively means it needs to give an
answer to
all diffs in one go.

> > The 'L' channel lets the server to ask the client for a prompt reply,
> > it's used in places such as ui.prompt(), 'for line in ui.fin', etc.
>
> Should the line contain a final '\n' or a final '\r\n'? Or does it not
> matter since the server will run str.strip on the line anyway?

It doesn't matter. The server reads it as is and returns it to whoever
wanted to read a line, who will strip it (or not according to its needs).

>
> The doc says that the line should be "trimmed to length". Both Java and
> .NET happen to call their str.strip equivalent trim", but you mean that
> one can send up to length bytes, right?

Yeah, although it's more "should" than "can": the returned data by the
client must not exceed <length> bytes.

I'll change the "trim" wording on the wiki so people are not confused like
you said.

>
> > Also, using the 'I' channel alone would introduce some complexity on
> > the server side, it'll need to split on new lines and buffer the
> > remainder for subsequent reads.
>
> I guess the client libraries will have to do the buffering now.
>
> >>  It seems to me that the 'I' channel would be enough, also for
> >>  reading patches from stdin. Since the 'I' channel is not
> >>  line-oriented, it does not have the 4096 byte line length maximum.
> >
> > The 4096 length is just a way for the server to limit the amount of
> > data the client sends in one go, giving it a chance to read from the
> > server to avoid a deadlock (see the protocol RFC discussion where Matt
> > explained why this is needed). So it's not specific to the line
> > channel, there can be lines with size >4096
>
> Yeah, I remember this discussion. I had expected that the server could
> use the 'I' channel to request 4096 bytes at a time from the client.
>
> --
> Martin Geisler
>
> aragost Trifork
> Professional Mercurial support
> http://mercurial.aragost.com/kick-start/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20110701/f43840c2/attachment.htm>