[PATCH 1 of 6 foldmap-in-C] encoding: define an enum that specifies what normcase does to ASCII strings
Siddharth Agarwal
sid at less-broken.com
Thu Apr 2 11:40:16 CDT 2015
On 04/02/2015 01:14 AM, Adrian Buehlmann wrote:
> On 2015-04-02 04:48, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0 at fb.com>
>> # Date 1427872870 25200
>> # Wed Apr 01 00:21:10 2015 -0700
>> # Node ID 42a1040af0c362b38ce45fc71e065d1769902c79
>> # Parent 37a2b446985f2ef77b9690a0548c8630828b7412
>> encoding: define an enum that specifies what normcase does to ASCII strings
>>
>> For C code we don't want to pay the cost of calling into a Python function for
>> the common case of ASCII filenames. However, while on most POSIX platforms we
>> prefer to normalize filenames by lowercasing them, on Windows we uppercase
>> them. We define an enum here indicating the direction that filenames should be
>> normalized as. Some platforms (notably Cygwin) have more complicated
>> normalization behavior -- we add a case for that too.
>>
>> In upcoming patches we'll also define a fallback function that is called if the
>> string has non-ASCII bytes.
>>
>> This enum will be replicated in the C code to make foldmaps. There's
>> unfortunately no nice way to avoid that -- we can't have encoding import
>> parsers because of import cycles. One way might be to have parsers import
>> encoding, but accessing Python modules from C code is just awkward.
>>
>> The name 'normcaseasciispecs' was chosen to indicate that this is merely
>> an integer that specifies a behavior, not a function. The name was pluralized
>> since in upcoming patches we'll introduce 'normcaseasciispec' which will be one
>> of these values.
>>
>> diff --git a/mercurial/encoding.py b/mercurial/encoding.py
>> --- a/mercurial/encoding.py
>> +++ b/mercurial/encoding.py
>> @@ -354,6 +354,19 @@ def upper(s):
>> except LookupError, k:
>> raise error.Abort(k, hint="please check your locale settings")
>>
>> +class normcaseasciispecs(object):
>> + '''what a platform's normcase does to ASCII strings
>> +
>> + This is specified per platform, and should be consistent with what normcase
>> + on that platform actually does.
>> +
>> + lower: normcase lowercases ASCII strings
>> + upper: normcase uppercases ASCII strings
>> + other: the fallback function should always be called'''
>> + lower = -1
>> + upper = 1
>> + other = 0
>> +
>> _jsonmap = {}
>>
>> def jsonescape(s):
> Ugh, this sounds ugly.
I don't disagree.
> I guess there is not much chance this surprising difference between
> Mercurial's util.normcase function doing uppercase when run on Windows
> and lowercase when run on other platforms could be eliminated.
>
> We originally used os.path.normcase on Windows, which happens to
> lowercase the string there, but this was changed to uppercase by foozy
> on 2011-12-16 with 3c5e818ac679 ("windows: use upper() instead of
> lower() or os.path.normcase()").
>
> I guess there is no chance to turn back 3c5e818ac679, given the problems
> foozy had to deal with there.
Not that I'm aware of. For correctness we really need to uppercase on
NTFS (per Michael Kaplan's blog post) and lowercase on HFS+ (per Apple
tech note 1150). There's just no way around that.
- Siddharth
More information about the Mercurial-devel
mailing list