[PATCH 1 of 6 foldmap-in-C] encoding: define an enum that specifies what normcase does to ASCII strings

Thu Apr 2 15:50:32 CDT 2015

On Wed, 2015-04-01 at 19:48 -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0 at fb.com>
> # Date 1427872870 25200
> #      Wed Apr 01 00:21:10 2015 -0700
> # Node ID 42a1040af0c362b38ce45fc71e065d1769902c79
> # Parent  37a2b446985f2ef77b9690a0548c8630828b7412
> encoding: define an enum that specifies what normcase does to ASCII strings
> 
> For C code we don't want to pay the cost of calling into a Python function for
> the common case of ASCII filenames. However, while on most POSIX platforms we
> prefer to normalize filenames by lowercasing them, on Windows we uppercase
> them. We define an enum here indicating the direction that filenames should be
> normalized as. Some platforms (notably Cygwin) have more complicated
> normalization behavior -- we add a case for that too.
> 
> In upcoming patches we'll also define a fallback function that is called if the
> string has non-ASCII bytes.
> 
> This enum will be replicated in the C code to make foldmaps. There's
> unfortunately no nice way to avoid that -- we can't have encoding import
> parsers because of import cycles. One way might be to have parsers import
> encoding, but accessing Python modules from C code is just awkward.
> 
> The name 'normcaseasciispecs' was chosen to indicate that this is merely
> an integer that specifies a behavior, not a function. The name was pluralized
> since in upcoming patches we'll introduce 'normcaseasciispec' which will be one
> of these values.
> 
> diff --git a/mercurial/encoding.py b/mercurial/encoding.py
> --- a/mercurial/encoding.py
> +++ b/mercurial/encoding.py
> @@ -354,6 +354,19 @@ def upper(s):
>      except LookupError, k:
>          raise error.Abort(k, hint="please check your locale settings")
>  
> +class normcaseasciispecs(object):
> +    '''what a platform's normcase does to ASCII strings
> +
> +    This is specified per platform, and should be consistent with what normcase
> +    on that platform actually does.
> +
> +    lower: normcase lowercases ASCII strings
> +    upper: normcase uppercases ASCII strings
> +    other: the fallback function should always be called'''
> +    lower = -1
> +    upper = 1
> +    other = 0

This looks ok, but I think you've gotten hung up on 7-bit strings when
naming this. The direction here applies to what normcase does with
Unicode strings too, no? I think this should simply be normcasespec(s)?

Also, as long as you're poking around here, I'll mention that fast
isascii and isutf8 functions would be handy.

-- 
Mathematics is the supreme nostalgia of our time.