Possibly changing the path encoding format

Adrian Buehlmann adrian at cadifra.com
Fri Sep 21 19:10:22 CDT 2012


On 2012-09-22 01:16, Adrian Buehlmann wrote:
> On 2012-09-21 21:03, Bryan O'Sullivan wrote:
>> On Fri, Sep 21, 2012 at 11:16 AM, Adrian Buehlmann <adrian at cadifra.com
>> By having one code path for both cases, we have less complexity, less
>> stuff to audit, and less code to go wrong.
> 
> Wouldn't it be pretty simple to have a variant of
> 
> 
> static Py_ssize_t basicencode(char *dest, size_t destsize,
> 			      const char *src, Py_ssize_t len)
> {
> 	static const uint32_t twobytes[8] = { 0, 0, 0x87fffffe };
> 
> 	static const uint32_t onebyte[8] = {
> 		1, 0x2bff3bfa, 0x68000001, 0x2fffffff,
> 	};
> 
> 	Py_ssize_t destlen = 0;
> 
> 	if (len < 5 || memcmp(src, "data/", 5) != 0) {
> 		memcopy(dest, &destlen, destsize, src, len);
> 		return destlen;
> 	}
> 
> 	memcopy(dest, &destlen, destsize, "data/", 5);
> 
> 	return _encode(twobytes, onebyte, dest, destlen, destsize,
> 		       src + 5, len - 5, 1);
> }
> 
> 
> which does the same as that, but lowercases everything, thus at least
> avoiding the X -> _x thing? ("basicencodelow").
> 
> I'm not that much of an expert of this code, but I think it might be
> possible by just tweaking "twobytes" and "onebyte".
> 
> If I'm right, then the two code paths (basicencode and basicencodelow)
> could call the same _encode function.
> 
> Of course we then couldn't amortize the basicencode call, but I think we
> would only have to do a first length scan with
> 
>     basicencode(NULL, 0, path, len + 1)
> 
> and if that returns > maxstorepathlen + 1, we would do a basicencodelow.
> 
> I think the X -> _x encoding is the most pointless part of basicencode
> for paths that in fact need hashing.

Most interestingly, it looks like doing

   _encode(twobytes, onebyte, dest, destlen, destsize, src, len, 0);

instead of

   _encode(twobytes, onebyte, dest, destlen, destsize, src, len, 1);

will do *no* direncoding (which would be another candidate variant for
basicencodelow).



More information about the Mercurial-devel mailing list