Initial support of Unicode filenames

Victor Stinner victor.stinner at haypocalc.com
Fri Oct 28 17:28:43 CDT 2011


Hi,

On Windows, filenames are stored as Unicode. There is a bytes API providing a 
backward compatibility, but it should not be used, because you may get invalid 
filename (with question marks, ?) if a filename is not encodable to the ANSI 
code page.

Attached patch uses Unicode filenames to avoid encoding issues on Windows. The 
patch on ui.py uses backslashreplace to escape unencodable characters when 
writing filenames to the console (and so not fail if a character is not 
encodable to the console code page).

To try my patch, create an unencodable filename. Example in Python: 
open(u"\uFD20.txt", "w").close(). And then run "hg status".

--

On Python 3, you can use Unicode filenames on Windows and UNIX. Thanks to the 
surrogateescape error handler, undecodable bytes are stored as surrogates. 
Invalid filenames are supported correctly on UNIX (you cannot get Unicode 
errors on decoding filenames).

On Python 2, os.listdir(unicode) returns undecodables filename unchanged and 
don't have the surrogateescape error handler. That's why I prefer to leave 
Mercurial unchanged on UNIX with Python 2.

My patch only uses Unicode filenames on Windows and on Python 3.

TODO: patch also osutil.c. I can do it if you like.

Victor

PS: Please CC-me to your answers, I didn't subscribe to the mailing list.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mercurial_unicode.patch
Type: text/x-patch
Size: 3726 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20111029/0fe8de1c/attachment.bin>


More information about the Mercurial-devel mailing list