Possibly changing the path encoding format

Adrian Buehlmann adrian at cadifra.com
Sat Sep 22 02:34:43 CDT 2012


On 2012-09-21 20:16, Adrian Buehlmann wrote:
> What's more, as I've already posted, there are possibly simpler ways to
> encode directories named aux & friends.

I'd like to stress here that the encoding for hashed paths can be
non-reversible and fold multiple input paths into the same result
(except for the sha-1 hash), as the sha-1 hash will be distinctive
enough to finally avoid collisions.

So the aux-encoding we could choose for hashed paths could be very simple.

For example, the aux-encoding for hashed paths could simply check if the
string at the beginning of any shorted directory name starts with any of

  aux, con, prn, nul, com, lpt

and if it does, for example, replace the third character with %.

It would be possible to collect and assemble the shorted directories
*before* aux-encoding them (i.e. reading the unencoded path), as that
encoding would be size preserving.

This simple aux-encoding would be a bit too greedy, as, for example, it
also needlessly encodes the directory names

  auxiliary -> au%iliary ("auxiliary" is not a reserved word)
  com0 - > co%0  ("com0" is not reserved, only com1..com9)
  command -> co%mand ("command" is not reserved)

but I'd say this price might be worth the simplicity we would get.

Likewise, instead of doing the same as store._encodefname, we could
simply replace any of the Windows reserved characters ( \:*?"<>| ) with
% and drop the most significant bit to make the character fit into the
ASCII code range, e.g.:

  hi>world -> hi%world



More information about the Mercurial-devel mailing list