Initial support of Unicode filenames

Martin Geisler mg at aragost.com
Thu Nov 3 07:19:04 CDT 2011


Victor Stinner <victor.stinner at haypocalc.com> writes:

> Le Jeudi 3 Novembre 2011 10:31:28 Martin Geisler a écrit :
>
>> Today, a Windows user can commit a file named "Sweet crêpe
>> recipe.txt" and I can checkout the file on my Linux machine. I won't
>> get a "ê" in my filename, but I'll get a file I can modify and commit
>> changes to anyway.
>
> If we store filenames as UTF-8 (+ surrogateescape), you will get
> "Sweet crêpe recipe.txt" on Windows and Linux. I'm just saying that if
> your locale encoding is ASCII, the checkout will fail.

Yes and making a checkout fail is a serious regression compared to
today. Also, I'll see "Sweet crêpe recipe.txt" on my Latin-1 system.

> If this issue does really matter, we may add workarounds like encoding
> the unencodable characters to something encoding. E.g. replace "ê"
> (U+00EA) by "%EA" (3 characters encodable to ASCII), Mac OS X and
> Gnome use this trick somewhere (I am not sure).

We'll need to recognize the file again for 'hg status' purposes. So it's
probably no good to encode the "ê" by "%EA" unless we also start
decoding all "%EA" into "ê" characters. That would again be a serious
change compared to what we do today.

So all in all I'm trying to say that I think we have a fairly good grasp
on the possibilities and that there are some difficult tradeoffs to be
made here.

But please don't be scared away :-) I would really like to see Mercurial
do transcoding of filenames. I've deployed Mercurial at Swiss customers
and they immediatedly ran into problems with their unlauts.

Since you know a lot about how Unicode works in Python and on different
platforms, then I think it's great that you're taking a look at how to
solve this problem in Mercurial. Just be aware that we have a lot of
constraints because of backwards compatibility.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/


More information about the Mercurial-devel mailing list