Unicode support for non-unicode locales

Tue Oct 9 02:08:56 CDT 2007

Matt Mackall wrote:
> On Tue, Oct 09, 2007 at 01:59:52AM +0900, Shun-ichi GOTO wrote:
>> 2007/10/9, Shun-ichi GOTO <shunichi.goto at gmail.com>:
>>> If we treat filename as raw byte data, some filename might be broken
>>> in path operation. So the Python code shold handle filename as unicode
>>> characters by decoding.
>> In fact, current mercurial cannot manage some filename.
>> For example, a filename "?$B at 55,I=8=.txt" is the case.
>> 4 characters "?$B at 55,I=8=" is Japanese of "regular expression"
>> and 2nd byte of 3rd character is '\' (0x5c).
>> So, hg ci -Am "test"  fails on adding this file.
>>
>> {{{
>> [c:\temp\test]hg ci -Am initial
>> adding ?$B at 55,!&8=.txt
>> removing ?$B at 55,!&8=.txt
>> dir1/?$B at 55,!&8=.txt not tracked!
>> ?$B at 55,!&8=.txt not tracked!
>> nothing changed
>> }}}
> 
> Yes, Mercurial will be unhappy with wide character sets in various
> situations. It's either that or be unhappy with single byte character
> sets much more often.
> 

(for reference, the above-mentioned example works fine on linux with an 
utf-8 locale. i assume it works well everywhere if you keep the same 
locale (filesystem-encoding) everywhere)

but i think there are problems even when you checkout code on an unicode 
locale, if that locale is different from the where-the-file-was-added 
locale.

for example, add a non-ascii-file to a mercurial repository on an utf-8 
locale (most linux systems), and checkout on windows nt (afaik utf-16),
and you get garbled filenames.

do i understand correctly, that this is intentional, and there are no 
plans to fix this?

is there at least a workaround for this? (except the 
do-not-use-non-ascii-filenames? :)

gabor