Current py3k stage and next steps

Mon Jun 28 20:51:31 CDT 2010

2010/6/29 Matt Mackall <mpm at selenic.com>:
> On Mon, 2010-06-28 at 23:05 +0200, Martin Geisler wrote:
>> Matt Mackall <mpm at selenic.com> writes:
>>
>> > On Mon, 2010-06-28 at 21:51 +0200, Martin Geisler wrote:
>> >> The think I want to make explode is
>> >>
>> >>   _("foo %s bar") % rawbytes
>> >
>> > But we do this.. everywhere a filename is mentioned, for starters.
>>
>> There you go -- in my view, you've just pointed out a source of bugs.
>> You know I'm in the "we should decode filenames to unicode on input and
>> encode them to local encoding on output"-camp.
>
> I strongly recommend you not try to sink the Py3k port effort by tying
> this particular boat anchor to it. It's not helpful.
>

What kind of solution do _you_ foresee for the encoding problems?

It seems that the only major problem is to find a decent solution for f) ?

What about extending _ , or wrapping it, to pass arguments around?

Something like

def __(gettextkey, *args):
    def encodeifunicode(arg):
        if isinstance(arg, unicode):
            return arg.encode("utf8")
        else:
            # by default, str objects we have around should be
utf8-encoded strings
            return arg

    # assumption: _() returns utf8-encoded strings
    _(gettextkey) % tuple(map(encodeifunicode, args))

So we would use
   __("foo %s bar %s", rawbytes, mayberawbytes)
instead of
   _("foo %s bar %s") % (rawbytes, mayberawbytes)

At the end of such a function, we have a (consistent?) utf8 bytestring.

What problems do we have here then?

I think that two behaviours for ui.write are then possible

W1) if the message is meant for ui purposes and does not contain raw
data, we can be nice and try to transcode it to local encoding.
Fallback to 2) if we cant. (two reasons why we might not be able to
transcode: #1 the strings that we did not touch in __ were not valid
utf8 strings; #2 the local encoding cannot encode some of the
characters)
W2) if it contains raw stuff, we say "screw you terminal, what matters
more to us is to avoid altering byte data". In other hand, we output
bytes without thinking too much, and if the terminal cant display it,
we dont care. Which should be current behaviour.

The "can we try to transcode it if possible" question, or choice
between W1 or W2 could be answered by a flag in ui.write, or by using
a new ui.writeraw, or something of the like. "how" does not matter
much: I just suggest asking developers to think about what will be in
the message, and if byte content can be altered or not, and giving
them tools to have the choice.

What's not working with that?

I mean... let's try to list the possibilities we have, and their
weaknesses, so we can _really_ try to find solutions =)

Regards,
-- 
Nicolas Dumazet — NicDumZ