[PATCH 1 of 6 foldmap-in-C] encoding: define an enum that specifies what normcase does to ASCII strings

Siddharth Agarwal sid at less-broken.com
Thu Apr 2 11:40:16 CDT 2015


On 04/02/2015 01:14 AM, Adrian Buehlmann wrote:
> On 2015-04-02 04:48, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0 at fb.com>
>> # Date 1427872870 25200
>> #      Wed Apr 01 00:21:10 2015 -0700
>> # Node ID 42a1040af0c362b38ce45fc71e065d1769902c79
>> # Parent  37a2b446985f2ef77b9690a0548c8630828b7412
>> encoding: define an enum that specifies what normcase does to ASCII strings
>>
>> For C code we don't want to pay the cost of calling into a Python function for
>> the common case of ASCII filenames. However, while on most POSIX platforms we
>> prefer to normalize filenames by lowercasing them, on Windows we uppercase
>> them. We define an enum here indicating the direction that filenames should be
>> normalized as. Some platforms (notably Cygwin) have more complicated
>> normalization behavior -- we add a case for that too.
>>
>> In upcoming patches we'll also define a fallback function that is called if the
>> string has non-ASCII bytes.
>>
>> This enum will be replicated in the C code to make foldmaps. There's
>> unfortunately no nice way to avoid that -- we can't have encoding import
>> parsers because of import cycles. One way might be to have parsers import
>> encoding, but accessing Python modules from C code is just awkward.
>>
>> The name 'normcaseasciispecs' was chosen to indicate that this is merely
>> an integer that specifies a behavior, not a function. The name was pluralized
>> since in upcoming patches we'll introduce 'normcaseasciispec' which will be one
>> of these values.
>>
>> diff --git a/mercurial/encoding.py b/mercurial/encoding.py
>> --- a/mercurial/encoding.py
>> +++ b/mercurial/encoding.py
>> @@ -354,6 +354,19 @@ def upper(s):
>>      except LookupError, k:
>>          raise error.Abort(k, hint="please check your locale settings")
>>  
>> +class normcaseasciispecs(object):
>> +    '''what a platform's normcase does to ASCII strings
>> +
>> +    This is specified per platform, and should be consistent with what normcase
>> +    on that platform actually does.
>> +
>> +    lower: normcase lowercases ASCII strings
>> +    upper: normcase uppercases ASCII strings
>> +    other: the fallback function should always be called'''
>> +    lower = -1
>> +    upper = 1
>> +    other = 0
>> +
>>  _jsonmap = {}
>>  
>>  def jsonescape(s):
> Ugh, this sounds ugly.

I don't disagree.

> I guess there is not much chance this surprising difference between
> Mercurial's util.normcase function doing uppercase when run on Windows
> and lowercase when run on other platforms could be eliminated.
>
> We originally used os.path.normcase on Windows, which happens to
> lowercase the string there, but this was changed to uppercase by foozy
> on 2011-12-16 with 3c5e818ac679 ("windows: use upper() instead of
> lower() or os.path.normcase()").
>
> I guess there is no chance to turn back 3c5e818ac679, given the problems
> foozy had to deal with there.

Not that I'm aware of. For correctness we really need to uppercase on
NTFS (per Michael Kaplan's blog post) and lowercase on HFS+ (per Apple
tech note 1150). There's just no way around that.

- Siddharth


More information about the Mercurial-devel mailing list