[PATCH 0 of 1 v4] win32lfn: allow manipulating files with long names on Windows

Matt Mackall mpm at selenic.com
Tue Jan 25 12:39:19 CST 2011


On Mon, 2011-01-24 at 23:55 -0500, Aaron Cohen wrote:
> I found a post you made a while ago saying that the fundamentals are
> there to handle the UTF normalization problems in a similar way to the
> way Windows' case insensitivity is being done. Do you happen to have
> time to describe how that would look?

The essentials are that there is some function F that maps names from
their original form to a form that they can be compared for equality.
In other words, Foo and foo are the same because F(Foo) and F(foo) are
the same.

In our dirstate code, we build a table called the _foldmap, which is
basically:

foldmap = {}
for name in files:
    foldmap[F(name)] = name

Once we have that mapping, we can stop being confused.

So it all comes down to having the right F. Right now, our F is simply
string.lower(), which works nicely for ASCII names. But Windows'
internal F is actually much more complex than that (and not particularly
well-documented!). It does case-folding on Unicode and (at least in some
situations) tells you that A = Ä.

The same applies for OS X, by the way. We just need to supply an F that
matches the non-standard under-documented Unicode-mangling that it's
doing, and things will be good.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list