[PATCH 2 of 2] dirstate: normalize on case insensitive filesystems on Mac (issue1663)

Matt Mackall mpm at selenic.com
Thu Jul 23 12:12:17 CDT 2009


On Thu, 2009-07-23 at 18:49 +0200, Adrian Buehlmann wrote:
> On 23.07.2009 17:14, Matt Mackall wrote:
> > On Thu, 2009-07-23 at 12:08 +0200, Simon Heimberg wrote:
> >> Am Mittwoch, den 22.07.2009, 14:04 -0500 schrieb Matt Mackall:
> >>> On Wed, 2009-07-22 at 15:22 +0200, Simon Heimberg wrote:
> >>>> # HG changeset patch
> >>>> # User Simon Heimberg <simohe at besonet.ch>
> >>>> # Date 1248264291 -7200
> >>>> # Node ID f812e62a12b68c035b1aef3b3732f8486c376373
> >>>> # Parent  ca876099803a9e71497d9deaee3c9fb7ff47ee81
> >>>> dirstate: normalize on case insensitive filesystems on Mac (issue1663)
> >>>>
> >>>> os.path.normcase does not change the path on Mac OS X (uses possix module)
> >>>>
> >>>> diff -r ca876099803a -r f812e62a12b6 mercurial/dirstate.py
> >>>> --- a/mercurial/dirstate.py	Mit Jul 22 12:52:02 2009 +0200
> >>>> +++ b/mercurial/dirstate.py	Mit Jul 22 14:04:51 2009 +0200
> >>>> @@ -351,8 +351,14 @@
> >>>>          except KeyError:
> >>>>              self._ui.warn(_("not in dirstate: %s\n") % f)
> >>>>  
> >>>> +    _usenormcase = os.path.normcase("A") == "a"
> >>>> +
> >>>>      def _normalize(self, path, knownpath):
> >>>> -        norm_path = os.path.normcase(path)
> >>>> +        if self._usenormcase:
> >>>> +            norm_path = os.path.normcase(path)
> >>>> +        else:
> >>>> +            #case insensitive filesystem on Mac OS X
> >>>> +            norm_path = path.lower()
> >>> You're going to have to be more clever than lower(), I'm afraid.
> >>> Consider a file named 'Ä' and the possibility that your local character
> >>> set might be set to MacRoman. There's also the whole issue of Unicode
> >>> normalization.
> >>>
> >> This is as clever as os.path.normcase in windows and mac (not OS X) (see
> >> ntpath.py and macpath.py from python). We could use macpath.normcase for
> >> being upgraded when python is.
> >>
> >>> I think we need to have a more general facility for dealing with all
> >>> forms of folding (ie any non-direct filename matching/mangling) that
> >>> allows us to deal with all the stupid Windowsisms and Macisms.
> >>> Case-folding is just the most commonplace form of it.
> >>>
> >> Maybe this could happen too on linux or unix when a special file system 
> >> is mounted or with some mount options.
> > 
> > Perhaps.
> > 
> >> More general is good, but I do not have an idea how to do it. Any hint?
> >> What are other cases of folding?
> > 
> > Windows:
> > 
> > foo. -> foo
> > (there might be some magic with trailing spaces too)
> 
> Quoting [1]:
> 
> "Basic Naming Conventions
> 
> The following fundamental rules enable applications to create and process valid
> names for files and directories, regardless of the file system:
> ...
> Do not end a file or directory name with a trailing space or a period. Although
> the underlying file system may support such names, the operating system does not.
> ..."
> 
> Not sure if that can be subsumed under "folding" or "mangling".

I'm taking 'folding' to be any operation that maps more than one input
to one output. In other words, one portion of the input space is
projected onto the same part of the output space as another portion
space, creating an overlap region where collisions are possible.

There are in fact two different kinds of folding for our purposes:
folding on file creation (ie I ask for a new file 'foo.' and I get a
file 'foo') or on file lookup (ie I ask to open file 'FOO' and I get the
contents of file 'foo'). Unicode normalization is in the former
category. Case-preserving but case-insensitive filesystems are in the
latter.

There may also be some interesting non-folding forms of mangling but
none spring to mind.

> Maybe file names ending in period or space(s) should be treated as not
> being valid names for files on Windows. That is, Mercurial should probably
> not try writing or accepting such names and maybe abort (on Windows).

Maybe. But if we're ever going to try to handle stuff like HFS
normalization, we might as well handle that too.

-- 
http://selenic.com : development and support for Mercurial and Linux




More information about the Mercurial-devel mailing list