[PATCH 1 of 6 foldmap-in-C] encoding: define an enum that specifies what normcase does to ASCII strings

Thu Apr 2 13:22:57 CDT 2015

On Thu, 2015-04-02 at 10:14 +0200, Adrian Buehlmann wrote:
> On 2015-04-02 04:48, Siddharth Agarwal wrote:
> > # HG changeset patch
> > # User Siddharth Agarwal <sid0 at fb.com>
> > # Date 1427872870 25200
> > #      Wed Apr 01 00:21:10 2015 -0700
> > # Node ID 42a1040af0c362b38ce45fc71e065d1769902c79
> > # Parent  37a2b446985f2ef77b9690a0548c8630828b7412
> > encoding: define an enum that specifies what normcase does to ASCII strings
> > 
> > For C code we don't want to pay the cost of calling into a Python function for
> > the common case of ASCII filenames. However, while on most POSIX platforms we
> > prefer to normalize filenames by lowercasing them, on Windows we uppercase
> > them. We define an enum here indicating the direction that filenames should be
> > normalized as. Some platforms (notably Cygwin) have more complicated
> > normalization behavior -- we add a case for that too.
> > 
> > In upcoming patches we'll also define a fallback function that is called if the
> > string has non-ASCII bytes.
> > 
> > This enum will be replicated in the C code to make foldmaps. There's
> > unfortunately no nice way to avoid that -- we can't have encoding import
> > parsers because of import cycles. One way might be to have parsers import
> > encoding, but accessing Python modules from C code is just awkward.
> > 
> > The name 'normcaseasciispecs' was chosen to indicate that this is merely
> > an integer that specifies a behavior, not a function. The name was pluralized
> > since in upcoming patches we'll introduce 'normcaseasciispec' which will be one
> > of these values.
> > 
> > diff --git a/mercurial/encoding.py b/mercurial/encoding.py
> > --- a/mercurial/encoding.py
> > +++ b/mercurial/encoding.py
> > @@ -354,6 +354,19 @@ def upper(s):
> >      except LookupError, k:
> >          raise error.Abort(k, hint="please check your locale settings")
> >  
> > +class normcaseasciispecs(object):
> > +    '''what a platform's normcase does to ASCII strings
> > +
> > +    This is specified per platform, and should be consistent with what normcase
> > +    on that platform actually does.
> > +
> > +    lower: normcase lowercases ASCII strings
> > +    upper: normcase uppercases ASCII strings
> > +    other: the fallback function should always be called'''
> > +    lower = -1
> > +    upper = 1
> > +    other = 0
> > +
> >  _jsonmap = {}
> >  
> >  def jsonescape(s):
> 
> Ugh, this sounds ugly.
> 
> I guess there is not much chance this surprising difference between
> Mercurial's util.normcase function doing uppercase when run on Windows
> and lowercase when run on other platforms could be eliminated.

No, it's a difference present in the underlying filesystems (NTFS
compares via upper(), HFS+ via lower()). And given that there are a
number of scripts that aren't quite 1:1 upper:lower, there are
differences that appear. There's even a script that has one uppercase
and two lowercases, so it's fully 1:2.

-- 
Mathematics is the supreme nostalgia of our time.