[PATCH] V2 of experiment for a simpler path encoding for hashed paths (for "fncache2")
Adrian Buehlmann
adrian at cadifra.com
Tue Sep 25 11:10:20 CDT 2012
On 2012-09-25 17:35, Adrian Buehlmann wrote:
> On 2012-09-25 14:53, Adrian Buehlmann wrote:
>> diff --git a/mercurial/pathencode.c b/mercurial/pathencode.c
>> --- a/mercurial/pathencode.c
>> +++ b/mercurial/pathencode.c
>> @@ -479,8 +479,104 @@
>> src, len, 1);
>> }
>>
>> +static char encchar[128] = "~abcdefghijklmnopqrstuvwxyz{~}~~"
>> + " !\"#$%&'()~+,-.~0123456789~;~=~~"
>> + "@abcdefghijklmnopqrstuvwxyz[~]^_"
>> + "`abcdefghijklmnopqrstuvwxyz{~}~~";
>> +
>> +/* this encoding folds */
>> +static inline char encodechar(char c)
>> +{
>> + return c ? encchar[0x7f & c] : 0;
>> +}
>> +
>> static const Py_ssize_t maxstorepathlen = 120;
>>
>
> This can be further simplified by using string token "\0", and inserting
> a const probably makes sense there (passes unit tests):
>
> static const char encchar[128] =
> "\0abcdefghijklmnopqrstuvwxyz{~}~~"
> " !\"#$%&'()~+,-.~0123456789~;~=~~"
> "@abcdefghijklmnopqrstuvwxyz[~]^_"
> "`abcdefghijklmnopqrstuvwxyz{~}~~";
>
> /* this encoding folds */
> static inline char encodechar(char c)
> {
> return encchar[0x7f & c];
> }
..which is a very bad idea, because it fails on "\x80", which it encodes
to 0 (yikes).
The earlier version nicely encodes \x80 as ~:
>>> from mercurial.parsers import cutdirs
>>> cutdirs("\x80\x81\x82..\xf8\xf9\xfa/\xfb\xfc\xfd\xfe\xff")
'~ab..xyz/{~}~~'
(I was lucky without knowing why :-). Sorry for the noise.)
So I continue experimenting with:
static const char encchar[128] =
"~abcdefghijklmnopqrstuvwxyz{~}~~"
" !\"#$%&'()~+,-.~0123456789~;~=~~"
"@abcdefghijklmnopqrstuvwxyz[~]^_"
"`abcdefghijklmnopqrstuvwxyz{~}~~";
/* this encoding folds */
static inline char encodechar(char c)
{
return c ? encchar[0x7f & c] : 0;
}
More information about the Mercurial-devel
mailing list