C-hglib - Level 0, API proposal.

Mon Jul 15 03:34:38 CDT 2013

On 10 juil. 2013, at 23:22, Iulian Stana wrote:

> 2013/7/8 Matt Mackall <mpm at selenic.com>
>> On Thu, 2013-07-04 at 18:57 +0300, Iulian Stana wrote:
>>> /* return the handle for a connection */
>>> hg_handle* hg_open(const char *path);
>>>  
> I forgot and probably I have't seen that pointer.
>  
>> What sorts of errors might we encounter?
> 
> I don't know yet, we will discuss later. I don't have so much experience,
> and I don't know what kinds of errors can appear. In time we will find
> them.  (I don't want to throw some answers that I just read about)

It's hard to foresee all possible errors, So you need to think about possible family of errors (communication errors, arguments errors, runtime errors, etc). Its much easier to imagine what generic kind of errors might happen. As has each family of errors will need the same kind of handling you can check if your approach is robust to all possibles family.

TLDR, development is incremental, but you must check that all design decision are viable.

>>>  /* It's just sending the command to the cmdsrv.*/
>>>  int hg_rawcommand(hg_handle *handle, const char *command);
>> 
>> Consider:
>> 
>> ret = hg_rawcommand(h, "add this filename has spaces in it");
>> 
>> Will this do the right thing? If so, how?
> 
> From what I had already implement, there is no problem over here. [1]
> After each word there must be '\0'. 
> From the command server documentation:
> "Run the command specified by a list of \0-terminated strings. "
> 
> [1]: https://bitbucket.org/istana/c-hglib/commits/a6832416438b1c803300fed417c780806fce5bbe#Lhglib/client.cT62
> 
> From what I read in the execl/execv manual, the form of those commands
> are:
>        int execl(const char *path, const char *arg, ...);
>        int execv(const char *path, char *const argv[]);
> 
> The execl gets a compact string and maybe other arguments, in my 
> case I have just the compact string and in the execv function there
> is a list of arguments.

So you function takes a single bytes array to be sent as is to the command server ?

That sound dangerous and unfriendly.

Unfriendly because the user has to build the bytes array itself while he had expected the API to pass argument to looks like other API to pass argument.

Dangerous because a bytes arrays with mostly ascii char but some "\0" really looks like a string and people will get bitten if they try to used standard string function on them.

TLDR, check exec* API again, they are doing it right.

>>> a) hg init <- doesn't start with a repo
>>> In this case, I will create a new process were I will execute the init
>>> command, to create the new repo. After this, I will establish the
>>> connection through pipes, with the cmdserver.
>>> For the clone command, there I will execute the same steps.
>>> The return in those cases will be the handle for the connection.
>> 
>> How about clone? How about other commands that don't require an
>> extension like help, version, etc? How about commands we do not know
>> about yet because they are in extensions or future versions of
>> Mercurial?
> 
> For the clone command, there I will execute the same steps.
> - I will create a new process were I will execute the clone command
> - I establish the connection with the cmdserver
> - I will return to the user the "handle" 
> 
> About the help, version and related commands there is no need for a 
> connection with the command server. A system call can be enough.
> (If you really want I can create some commands )

No! It is important to be able to run those command in the command server. A command server may have specific configs, the user must be able to introspect it. Have a look at how python-hglib does it. Client without repo are allowed.

Not to speak about potential important command that extension may add.

TLDR, You need to be generic. support any commands for client with repo and client without.

>>> b) hg log <- can produce huge output
>>> I know that some repo could have a huge log, but I don't know if the user
>>> will use that huge output. My single thought right now is to set a limit
>>> for the huge mass of data.
>> 
>> I feel like I've spent hours explaining to you that this is not
>> acceptable already.
>> 
>> Here's how it ought to work:
>> 
>> - client opens a connection
>> - client sends a command
>> - client reads output until no more output is available
>> - client reads result code
>> 
>> This is not hard and I spelled this out already way back here:
>> 
>> http://mercurial.markmail.org/thread/tc6hsvl7fofdjqcl
>> 
>> Note how it never reads more than 4k at a time.
> 
> I understand that. From what you said until now it's ok to 
> get data in chunks of max 4k, and I will do the same. My biggest 
> problem here and maybe a misunderstanding, is that, that in the
> python-hglib and in the java-hglib API's there is a specific command
> called 'log' that will return to user the result without getting chucks
> of 4k.

The log command in python-hglib and java-hglib are much more higher level that what we are discussing here. In python-hglib, client.log calls client.rawcommand that calls client.runcommand. runcommands does read 4k chunks

Here runcommands is the level-0 API that you should be focussing of. The fact that higher level API in python-hglib does not implements protection agains big output is not important here.

TLDR, focus on level-0 API and make it solid

>>> c) hg import - <- wants a patch fed to it from client
>>> If patch is a list of files or a file-like, then the input for the command
>>> server will be from those files. In this case the data will be send line by
>>> line to the server.
>> 
>> A client will want to be able to do the equivalent of:
>> 
>> hg -R repo-a export | hg -R repo-b import -
>> 
>> ..without buffering the entire patch either in memory or on disk.
> 
>  
> Hmm, interesting, I don't really know how I will do this thing, right now, but I
> understand what do you mean. 
> Probably there will be a special command for this situation. I don't know
> if there is any command on the python-hglib that solve this problem. 
> (Maybe It will be on a Level 2, the implementation)

See the previous block, everything commands in the python-hglib use client.runcommand. The level-0 function *solving this problem*. (even if upper level may be less resistant than level-0, user can always fallback to level-0)

>>> d) hg merge <- has prompts
>>> By default, if there will be any prompts the merge will abort.
>> 
>> Assume this is a question designed to make you search for a non-trivial
>> answer. Suppose I'm building a Mercurial GUI that uses c-hglib. Such a
>> GUI will most assuredly want to be able to do interactive merges. How do
>> I get the prompt questions displayed to the user and get the user
>> answers back?
>  
> The cmdserver will return some prompt questions that the user will receive.
> There will be more possibility to deal with those prompts:
>   - answer, individual to the prompts, in what way he wants. 
>   - create a prototype function that will answer in a specific way depends of
> the prompt
>   - creating a default prototype that will deal with the prompts and will fail
> in silence

How is the prompting transmitted to the calling code ? what's the API to interract with it?

>  e) hg verify <- might give warnings intermixed with output
>>> Returns 0 on success, 1 if errors are encountered.
>>> In handle I will save the good output and warnings/errors.
>> 
>> Let's try this a different way: you may only call malloc once per
>> connection and you may not malloc more than 1k. Of course, you can still
>> expect commands to return gigabytes of output. Assume some commands
>> (like verify) can return more than 1k of error messages and that
>> dropping such messages may be unacceptable.
>  
> There can be a the same solution as in the log case.

Yes, this is the same problem that needs the same solution.

> About the memory of 1k, you can resize that memory or you can mapping 
> somehow the memory.

I'm confused by this answer. Solution 1 i'm totally missing its meaning. Solution 2 its plain wrong. Being limited to a single malloc of 1k is just a way to force you to think how to not just buffer everything all the time. "Resizing the memory" is not a valid answer.

-- 
Pierre-Yves David