[PATCH 2 of 2] dirstate: normalize on case insensitive filesystems on Mac (issue1663)

Matt Mackall mpm at selenic.com
Fri Jul 24 14:44:41 CDT 2009

On Fri, 2009-07-24 at 19:38 +0200, Dan Villiom Podlaski Christiansen
> On 22/07/2009, at 21.04, Matt Mackall wrote:
> > You're going to have to be more clever than lower(), I'm afraid.
> > Consider a file named 'Ä' and the possibility that your local  
> > character
> > set might be set to MacRoman. There's also the whole issue of Unicode
> > normalization.
> >
> > I think we need to have a more general facility for dealing with all
> > forms of folding (ie any non-direct filename matching/mangling) that
> > allows us to deal with all the stupid Windowsisms and Macisms.
> > Case-folding is just the most commonplace form of it.
> I've attached a proof-of-concept patch below which tries to solve this.
> The patch contains a test case: a single revision with a single file,  
> ‘Å’. The test case was created on FreeBSD and the file name, in bytes,  
> is ‘\xc3\x85’, but on Mac OS X, this is transformed by the kernel to ‘A 
> \xcc\x8a’. This causes ‘hg status’ to report an unknown file when the  
> repository is checked out.
> After a bit of digging around, the only reliable way to get the  
> ‘proper’ name of a file appeared to be a roundtrip to Carbon. I  
> implemented two versions of it, depending on whether direct toolbox or  
> ctypes bindings are used.
> The code may not be pretty, and probably not fast either, but it  
> mostly works. I'm not sure about symbolic links, though…
> --
> Dan Villiom Podlaski Christiansen
> danchr at gmail.com
> ================================================
> # HG changeset patch
> # User Dan Villiom Podlaski Christiansen <danchr at gmail.com>
> # Date 1248453300 -7200
> # Node ID ec820e35ba877efc9b93b98881c1f2fff2bb6a02
> # Parent  d98cef25b5afed5d8aa325ef87f98789367d8b6e
> util: add normalizepath() for getting the 'true' path on Mac OS X.
> diff --git a/mercurial/posix.py b/mercurial/posix.py
> --- a/mercurial/posix.py
> +++ b/mercurial/posix.py

Posix it ain't. It might be time for a mac.py.

Have you considered unicodedata.normalize("NFD", f).lower()?

(there are other hairy issues here, like filenames in Latin1)

http://selenic.com : development and support for Mercurial and Linux

More information about the Mercurial-devel mailing list