fixutf8 module

Martin Geisler mg at daimi.au.dk
Sun Feb 1 13:20:30 CST 2009


Stefan Rusek <stefan at rusek.org> writes:

> Martin,
>
> On Sun, Feb 1, 2009 at 7:05 PM, Martin Geisler <mg at daimi.au.dk> wrote:
>>
>> How about status and other commands which include filenames in their
>> output? Here I create a file called 'pærer.txt' ('pears.txt' in
>> English) and get UTF-8 encoded output in my Latin-1 terminal under
>> Linux:
>>
>>  % hg init
>>  % touch pærer.txt
>>  % hg stat
>>  ? pærer.txt
>
> I actually had that working but removed it before I submitted the
> patch, because on some systems it gives ugly results both ways.

Ugly in which way?

> I've re-added ui stuff. The way I do it, if you redirect the output it
> will write utf8 anyway. I am not sure if that is desirable.

Not on my system -- I would expect all output to be encoded in
util._encoding. It is then up to the user to configure that correctly,
maybe by setting HGENCODING if locale.getpreferredencoding() does not
return what the user wants.

The i18n code takes care to convert the output like this. The da.po file
with the Danish translation is encoded in UTF-8, but hg correctly
recoded the strings as Latin-1 for my terminal.

There may be some strange interaction there: i18n.gettext returns byte
strings encoded in util._encoding. As they are merged with other byte
strings we may end up with a mess where filenames are UTF-8 byte strings
and the rest of the text is Latin-1 byte strings.

I just tried out the extension as attached, and this is what I'm talking
about:

  % hg stat
  ? pærer.txt

So far so good. But here is a problem:

  % hg add
  tilføjer pærer.txt

The final line is the 'adding %s' with a UTF-8 filename inserted.

The effect of wrapping ui.write in util.tolocal is that util.tolocal is
fed the byte string "tilføjer pærer.txt" which it tries to decode with
UTF-8. It fails on the "ø" and uses the fallback encoding enstead.

The fallback is Latin-1 which is happy about "tilføjer" and also about
"pærer.txt" since those are valid characters in Latin-1. So the end
result is that tolocal does nothing and returns the original string.


By the way, it would be easier to update the extension if you put it in
a normal repository instead of a MQ (not a fork of mercurial, but an
otherwise empty repository). That way people can just clone the
repository and get the extension directly.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20090201/cccd150b/attachment.pgp 


More information about the Mercurial-devel mailing list