Initial support of Unicode filenames

Victor Stinner victor.stinner at haypocalc.com
Wed Nov 2 19:41:54 CDT 2011


Le jeudi 3 novembre 2011 01:29:24, Victor Stinner a écrit :
> If we store filenames are UTF-8, you would be able to share a repository on
> a USB key between two Windows setup using different ANSI code pages (e.g.
> cp1252 and cp932). You would also be able to use the full Unicode range on
> Windows, not only a small subset (the ANSI code page). For example, cp1252
> contains 256 code points vs 1.114.111 for Unicode 6.0).

Oh, I forgot to mention the main "issue" of such feature: you cannot clone a 
repository containing non-ASCII filenames if your filesystem encoding (= locale 
encoding) cannot encode them. For example, é (U+00E9) cannot be encoded to 
ASCII.

Mercurial and Python cannot do anything to help this issue: it's a problem in 
the user's setup. If you work with users in a heterogenous environment, you 
have to limit yourself to ASCII. Well, I don't think that Mercurial 1.9 
behaves better in such situation anyway...

Mac OS X and most Linux distro now use UTF-8 as the default locale encoding, 
so slowly everbody is moving to a fully Unicode compliant encoding (with an 
internal UTF-16 encoding for Windows).

--

Python 3 has a similar "issue": it allows non-ASCII identifiers, but not all 
users are able to display all Unicode characters. So you may get a source code 
which doesn't display correctly on your screen. But Python should not limit 
the user, the user has to limit itself to work nicely with its environment 
(and other users!). Non-ASCII identifiers is a nice feature to teach Python in 
your mother language.

Victor


More information about the Mercurial-devel mailing list