Initial support of Unicode filenames

Victor Stinner victor.stinner at haypocalc.com
Fri Oct 28 18:42:52 CDT 2011


Le samedi 29 octobre 2011 00:58:46, Matt Mackall a écrit :
> On Sat, 2011-10-29 at 00:28 +0200, Victor Stinner wrote:
> > Hi,
> > 
> > On Windows, filenames are stored as Unicode. There is a bytes API
> > providing a backward compatibility, but it should not be used, because
> > you may get invalid filename (with question marks, ?) if a filename is
> > not encodable to the ANSI code page.
> > 
> > Attached patch uses Unicode filenames to avoid encoding issues on
> > Windows. The patch on ui.py uses backslashreplace to escape unencodable
> > characters when writing filenames to the console (and so not fail if a
> > character is not encodable to the console code page).
> 
> I'm afraid I've already vetoed about a dozen variants of this suggestion
> over the years. For starters, it is not backward-compatible with
> existing Windows users.

The goal of the patch is not to provide a full Unicode support. It's just a 
step forward to improve Mercurial. In my case, I just want to fix "hg st" if 
the directory contains an unencodable filename. I shouldn't change how filenames 
are stored in Mercurial.

> Suggested reading:
> 
> http://mercurial.selenic.com/wiki/EncodingStrategy
> http://markmail.org/message/a7k7jhvjr6h3gjc3

If I understood correctly, filenames are stored as bytes in Mercurial, in the 
locale encoding (so the ANSI code page on Windows). I understand that the 
migration from bytes to Unicode is not trivial in this case.

I will have to read a little bit more to understand correctly the situation 
;-)

Victor


More information about the Mercurial-devel mailing list