Initial support of Unicode filenames
Martin Geisler
mg at aragost.com
Thu Nov 3 07:19:04 CDT 2011
Victor Stinner <victor.stinner at haypocalc.com> writes:
> Le Jeudi 3 Novembre 2011 10:31:28 Martin Geisler a écrit :
>
>> Today, a Windows user can commit a file named "Sweet crêpe
>> recipe.txt" and I can checkout the file on my Linux machine. I won't
>> get a "ê" in my filename, but I'll get a file I can modify and commit
>> changes to anyway.
>
> If we store filenames as UTF-8 (+ surrogateescape), you will get
> "Sweet crêpe recipe.txt" on Windows and Linux. I'm just saying that if
> your locale encoding is ASCII, the checkout will fail.
Yes and making a checkout fail is a serious regression compared to
today. Also, I'll see "Sweet crêpe recipe.txt" on my Latin-1 system.
> If this issue does really matter, we may add workarounds like encoding
> the unencodable characters to something encoding. E.g. replace "ê"
> (U+00EA) by "%EA" (3 characters encodable to ASCII), Mac OS X and
> Gnome use this trick somewhere (I am not sure).
We'll need to recognize the file again for 'hg status' purposes. So it's
probably no good to encode the "ê" by "%EA" unless we also start
decoding all "%EA" into "ê" characters. That would again be a serious
change compared to what we do today.
So all in all I'm trying to say that I think we have a fairly good grasp
on the possibilities and that there are some difficult tradeoffs to be
made here.
But please don't be scared away :-) I would really like to see Mercurial
do transcoding of filenames. I've deployed Mercurial at Swiss customers
and they immediatedly ran into problems with their unlauts.
Since you know a lot about how Unicode works in Python and on different
platforms, then I think it's great that you're taking a look at how to
solve this problem in Mercurial. Just be aware that we have a lot of
constraints because of backwards compatibility.
--
Martin Geisler
aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/
More information about the Mercurial-devel
mailing list