Initial support of Unicode filenames
Victor Stinner
victor.stinner at haypocalc.com
Fri Oct 28 17:28:43 CDT 2011
Hi,
On Windows, filenames are stored as Unicode. There is a bytes API providing a
backward compatibility, but it should not be used, because you may get invalid
filename (with question marks, ?) if a filename is not encodable to the ANSI
code page.
Attached patch uses Unicode filenames to avoid encoding issues on Windows. The
patch on ui.py uses backslashreplace to escape unencodable characters when
writing filenames to the console (and so not fail if a character is not
encodable to the console code page).
To try my patch, create an unencodable filename. Example in Python:
open(u"\uFD20.txt", "w").close(). And then run "hg status".
--
On Python 3, you can use Unicode filenames on Windows and UNIX. Thanks to the
surrogateescape error handler, undecodable bytes are stored as surrogates.
Invalid filenames are supported correctly on UNIX (you cannot get Unicode
errors on decoding filenames).
On Python 2, os.listdir(unicode) returns undecodables filename unchanged and
don't have the surrogateescape error handler. That's why I prefer to leave
Mercurial unchanged on UNIX with Python 2.
My patch only uses Unicode filenames on Windows and on Python 3.
TODO: patch also osutil.c. I can do it if you like.
Victor
PS: Please CC-me to your answers, I didn't subscribe to the mailing list.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mercurial_unicode.patch
Type: text/x-patch
Size: 3726 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20111029/0fe8de1c/attachment.bin>
More information about the Mercurial-devel
mailing list