[PATCH 2 of 3 V4 RFC] model (1): c-hglib: hg_log() level 1 function

Iulian Stana julian.stana at gmail.com
Wed Sep 4 07:07:17 CDT 2013


I will answer only where it needs. For the rest of observations you can
assume that I agree with you.



> :::: +#define CHANGESET "\\0{rev}\\n{node}\\n{tags}\\n{branch}\\n{author}\
> :::: +                        \\n{date|isodate}\\n{desc}"
>
> As I write below, I think that hg_fetch_cset_entry() is a little less
> cumbersome to implement if the '\0' is at the end of the changeset data
> instead of being at the beginning.
>
> Reason:
> in either case, one changeset of your history has to be "special cased".
> If the '\0' marker is at the beginning, the last changeset has to be
> treated
> exceptionally (where will it end?).
> If the '\0' marker is at the end, the first changeset has to be treated
> exceptionally (where will it begin?).
> But in the latter case, you know the answer, since the changeset data
> begins
> right after you issue the command 'hg log' to cmdsrv.
>
>
Yes I agree with you that I will treat changesets in the almost same way,
it the '\0' character is on the end or at the beginning. And you present
those
two cases almost perfect.

We will want to use the hg_fetch_cset_entry() function and the CHANGESET
template everywhere where it's possible and it's needed (i.e. heads,
parents,
incoming, outgoing commands).

For the last two commands there will be some problems if we use the '\0'
marker
at the end of template, because this command will deliver some outdata
before
starting delivering the csets.

e.g:
$ hg outgoing --template "--cset-begin--\n{node}\n{rev}\n--cset-end--\n"
comparing with ssh://hg@bitbucket.org/istana/c-hglib
searching for changes
--cset-begin--
2ad0741f6a6793019e6e113c41c4cb2dfe37a294
64
--cset-end--


:::: +    char *new_cset = malloc(new_cset_size + 1);
>
> :::: +    memcpy(new_cset, *cset + first_cset_size, new_cset_size + 1);
> :::: +    free(*cset);
>
> The *cset you're free'ing here was malloc'ed in the 'adding_data' function,
> as far as I can tell.
> There you where appending a chunk of the input stream to your internal
> buffer;
> this chunk could possibly contain more than one changeset segment.
>
> Please convince me that here you're not free'ing data below the "first
> changeset" boundary,
> because it looks very much like you are doing so.
>

When the program will receive this point, the cset (unparse buffer data)
will contain
some data like:
'\0aaaa\0bbbbbbbbbbb'
and what I am trying to do here is to release the memory for the first
changeset( 'a' cset)
first_cset_size will tell me how much space this cset is occupy. And in
this way I know
how much data I must save and from where.
Also when I am sending the 'first cset' to user I know how much data I must
parse and
how much space the first cset is occupy.


:::: +/* The high level log command for hglib API. */
> :::: +hg_csetstream_buffer *hg_log(hg_handle *handle, char *option[])
> :::: +{
>
> I really do not understand why hg_log() has to return a
> hg_csetstream_buffer*.
> There is no special information necessary to initialize the
> hg_csetstream_buffer*
> that is available only from inside hg_log().
> It is basically just the hg_handle, which is all over the place.
>

I think that it's more intuitive for users, to know that they need to
perform
some hg_fetch_cset_entry calls in order to call other API commands.



> And, as I write a few lines below, I think hg_csetstream_buffer* is an
> internal
> that should not be exposed to the user code.
>

I think that we must make a bigger discussion about this issue. I agree
with
you that hg_csetstream_buffer is internal and users should not change this
structure from their code. I also think that hg_header and hg_handle are not
suppose to be directly access from users.



> Looking at your signature, I see two "output parameters":
> (1) hg_csetstream_buffer *cbuf
> (2) hg_cset_entry *centry
>
> Now, the pointer cbuf look *a lot* like something
> the user could not care less about. It's an "internal".
>
> I wander if we can get rid of it, i.e. do not expose it on the function
> prototype.
> I *do understand* that the only reason we keep cbuf out of
> hg_fetch_cset_entry()
> is because we need it to be "persistent" across multiple function calls...
> I just don't like it and wander if it can be done better and cleaner.
>
> If, on the other side, I am wrong and cbuf is something the user code
> will need at some time (but I doubt it), then hg_fetch_cset_entry()
> is producing two different outputs (cbuf and centry), which is a clue
> that probably it's doing too much.
>

Like I said before hg_csetstream_buffer *cbuf, probably will be intuitive
for
user. Also we still need a store unite, I think that we cannot put all
hg_csetstream_buffer fields in hg_handle structure, will make this
structure
more harder to understand.



> :::: +    hg_header head = hg_head(cbuf->handle);
>
> :::: +    int exitcode;
> :::: +    char *get_data;
> :::: +    int read_size;
> :::: +
> :::: +    /* Erase the first cset from cset pointer.
> :::: +     * This cset was already pass to user.*/
> :::: +    if(cbuf->is_send && cbuf->buf_size){
> :::: +        cbuf->buf_size = erase_cset(&cbuf->buffer, cbuf->buf_size,
> :::: +                        cbuf->first_cset_size);
> :::: +    }
>
> You start by cleaning up. I don't get it.
> Can't you do this right after you parse the stream segment that correspond
> to a full changeset?
>

When I am passing the first cset to the user I am not creating a new memory
space where I am putting this cset I just give him some address pointer
where
 data is stored. I think that it's user duty to say "I need this data I
will store it
or I just want to verify it and I don't need to store it". Also I think
that will give
more work to user to deal with, I mean if I am creating this additional
memory
he will need to call a destructor every time.

Also I can put this data in cset_entry structure and free the memory
every time if I assume that the user will give me the same cset_entry
every time.


>  I think the field hg_csetstream_buffer.is_send (which should be spelled
> 'is_sent', btw)
> is useless and confusing.
> If you clean up right after you use the data, no need to keep track of
> what is sent or not.
>
>


> More generally, I think the whole function needs a rewrite.
> It's convoluted, difficult to follow, the intent is not clear.
> To an external reader it looks like everytime you encountered a bug
> you added an 'if-then' statement here and there.
> At a first glance, I could not even say if all conditions are covered
> (if you put an 'if-then' without an 'else', you're likely to be looking
> for troubles).
>
> I suggest you rewrite the body with something like:
>
> if (buffer contains null byte){
>         consume data;
>         return something;
> } else {
>         read until you get a null byte;
> }
>
> (the above works if the null byte '\0' is at the end of the changeset
> template.
>
>
OK. I will start changing the implementation. Also I think that will
work as well with '\0' byte at the beginning.


> :::: +            cbuf->first_cset_size = strlen(cbuf->buffer + 1) + 1;
>
> :::: +            parse_changeset(cbuf->buffer + 1, centry);
> :::: +            cbuf->is_send = 1;
> :::: +            return head.length;
>
> Here you return head.length, which makes very little sense:
> it's just the lenght of the last 'o' chunk you got from cmdsrv,
> while the parsed changeset you're returning (written in *centry)
> might have been built up by multiple 'o' chunks one sticked after the
> other.
>
> Another thing concerns me (we've discussed it over IRC, just reporting here
> for posterity): in your current hg_rawread implementation, if cmdsrv sends
> you a chunk bigger than the size you're willing to read (i.e. you read in
> 4K chunks),
> what you do is *writing* the amount of bytes still to be read into
> hg_header.length
> (see
> http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/62348/focus=62351)
>
> So: either we find a way to "const-ize" all variables of type hg_header,
> either we don't refer to those values out of the blue when various
> side-effects
> are playing jokes under the hood.
>
>
Yes we must discuss, I think that we must encapsulate hg_header structure
and
hg_handle structure.



> :::: +    exitcode = hg_exitcode(cbuf->handle);
> :::: +    free(cbuf->command);
> :::: +    free(cbuf->buffer);
> :::: +    free(cbuf);
> :::: +    return exitcode;
>
> Ok this must be a leftover from previous iterations; a few lines above
> you're
> returning a byte length, here you return a status code. More discipline
> please.
> More over, hg_exitcode() will be called by user code, not by this function.
>
>
I think that it's a bit nasty to return exitcode in the end. But I think
that
users don't want to know about hg_exitcode() function. If he use level 1
he want's just to receive data, some parse data. Probably he don't
want to be forced to call hg_exitcode function in the end.

I don't really know what values is hg_exitcode returning but I assume
that he return 0 for success and 1 for error, nothing else.

We can get exitcode in hg_cset_fetch_entry() function and verify it
and also return it in other form.
e.g:

>>(*) it's bool eval-ed to 1 if there is stuff to fetch,
>>(*) it's bool eval-ed to 0 if there is nothing more to fetch,
>>(*) and it can give clues if something goes wrong.

cheers,
> GGhh
>
>
-- 
Iulian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20130904/4097a8d1/attachment.html>


More information about the Mercurial-devel mailing list