[PATCH] V2 of experiment for a simpler path encoding for hashed paths (for "fncache2")

Thu Sep 27 04:07:41 CDT 2012

On 2012-09-27 10:43, Isaac Jurado wrote:
> On Wed, Sep 26, 2012 at 1:29 AM, Bryan O'Sullivan <bos at serpentine.com> wrote:
>> On Tue, Sep 25, 2012 at 5:53 AM, Adrian Buehlmann <adrian at cadifra.com>
>> wrote:
>>>
>>> V2 of experiment for a simpler path encoding for hashed paths (for
>>> "fncache2")
>>
>>
>> Interesting approach.
>>
>>>
>>> +                       else if (i + 2 < len) {
>>> +                               c = encodechar(src[i]);
>>> +                               c1 = encodechar(src[i + 1]);
>>> +                               c2 = encodechar(src[i + 2]);
>>> +                               if ((c == 'a' && c1 == 'u' && c2 == 'x')
>>> ||
>>> +                                   (c == 'c' && c1 == 'o' && c2 == 'n')
>>> ||
>>> +                                   (c == 'p' && c1 == 'r' && c2 == 'n')
>>> ||
>>> +                                   (c == 'n' && c1 == 'u' && c2 == 'l')
>>> ||
>>> +                                   (c == 'c' && c1 == 'o' && c2 == 'm')
>>> ||
>>> +                                   (c == 'l' && c1 == 'p' && c2 == 't'))
>>> {
>>
>>
>> This is going to do a lot of branching at the start of every path component.
>> That's something that can be avoided quite easily using a state machine.
>>
>> Otherwise, it looks decent (and now you know how tricky it is to write a
>> supposedly simple C encoding function!). Have you looked into writing a pure
>> Python version to see how complex it would be?
> 
> Just for the record.  You can safely use calls like strncmp(s, "aux",
> 3).  When one of the arguments is a constant string, the compiler (GCC
> at least) will replace the library call with an optimized comparison;
> usually an integer comparison.  The shorter the constant string the
> more chances it fits in a register for such comparison.  This way you
> minimize branching and the code is still readable.
> 
> Go ahead and see the generated assembly, you'll be surprised ;-)

I was actually indeed considering looking at the assembly (MS C here)
and comparing the two variants (state machine vs non-state machine),
also, measuring speeds is on my TODO.

I recently tried (I didn't send it to the list) using a state machine in
C for the reverse encoding of encodedir, which the current repo format
uses when loading the fncache file, and which is needed on the receiving
end at stream_in (localrepo, addchangegroup). I didn't see a speed gain
for the C state machine (vs the Python decodedir() ).

The non-state machine version of my _cutdirs has a bit higher
complexities with regards to length calculations. I made bugs in that
department during the hacking process. The state machine version of
_cutdirs always deals with a single char, which is pretty simple given
that the char-encoding used is size-preserving.