[PATCH] V2 of experiment for a simpler path encoding for hashed paths (for "fncache2")

Adrian Buehlmann adrian at cadifra.com
Tue Sep 25 11:10:20 CDT 2012


On 2012-09-25 17:35, Adrian Buehlmann wrote:
> On 2012-09-25 14:53, Adrian Buehlmann wrote:
>> diff --git a/mercurial/pathencode.c b/mercurial/pathencode.c
>> --- a/mercurial/pathencode.c
>> +++ b/mercurial/pathencode.c
>> @@ -479,8 +479,104 @@
>>  		       src, len, 1);
>>  }
>>  
>> +static char encchar[128] = "~abcdefghijklmnopqrstuvwxyz{~}~~"
>> +			   " !\"#$%&'()~+,-.~0123456789~;~=~~"
>> +			   "@abcdefghijklmnopqrstuvwxyz[~]^_"
>> +			   "`abcdefghijklmnopqrstuvwxyz{~}~~";
>> +
>> +/* this encoding folds */
>> +static inline char encodechar(char c)
>> +{
>> +	return c ? encchar[0x7f & c] : 0;
>> +}
>> +
>>  static const Py_ssize_t maxstorepathlen = 120;
>>  
> 
> This can be further simplified by using string token "\0", and inserting
> a const probably makes sense there (passes unit tests):
> 
> static const char encchar[128] =
> 	"\0abcdefghijklmnopqrstuvwxyz{~}~~"
> 	" !\"#$%&'()~+,-.~0123456789~;~=~~"
> 	"@abcdefghijklmnopqrstuvwxyz[~]^_"
> 	"`abcdefghijklmnopqrstuvwxyz{~}~~";
> 
> /* this encoding folds */
> static inline char encodechar(char c)
> {
> 	return encchar[0x7f & c];
> }

..which is a very bad idea, because it fails on "\x80", which it encodes
to 0 (yikes).

The earlier version nicely encodes \x80 as ~:

  >>> from mercurial.parsers import cutdirs
  >>> cutdirs("\x80\x81\x82..\xf8\xf9\xfa/\xfb\xfc\xfd\xfe\xff")
  '~ab..xyz/{~}~~'

(I was lucky without knowing why :-). Sorry for the noise.)

So I continue experimenting with:

static const char encchar[128] =
	"~abcdefghijklmnopqrstuvwxyz{~}~~"
	" !\"#$%&'()~+,-.~0123456789~;~=~~"
	"@abcdefghijklmnopqrstuvwxyz[~]^_"
	"`abcdefghijklmnopqrstuvwxyz{~}~~";

/* this encoding folds */
static inline char encodechar(char c)
{
	return c ? encchar[0x7f & c] : 0;
}


More information about the Mercurial-devel mailing list