Current py3k stage and next steps

Mon Jun 28 12:06:05 CDT 2010

On Mon, 2010-06-28 at 18:05 +0200, Martin Geisler wrote:
> (I'm sorry about the top post... my phone insists)
> 
> I think your example f) shows the main point: you cannot mix bytes and
> text.
> 
> This is already broken today - you must not output raw bytes in the
> middle of a string encoded in the local encoding. It is a bug to do so
> since you can end up producing a byte stream with a mixed encoding
> such as Latin-1 inside UTF-8.

Sorry, that's just a rule you invented to make the world a less scary
place. But there is no such rule.

If someone creates a file like this:

$ echo "This is what Latin1 looks like: blah blah blah" > latin1.txt
<switch to UTF-8>
$ echo "And this is what UTF-8 looks like: blah blah blah" > utf-8.txt
$ hg ci -Am"encoding examples"
$ hg export tip > example.patch

..that last line will work today without complaint, and it will generate
a patch that patch(1) understands and recreates a file with the same
byte contents. -That- is the rule, and anything else is wrong.

Really, Martin, it's high time you wrapped your head around the idea of
being encoding-agnostic: we only care about the encoding of data when we
need to. It's how Unix works and it's how Mercurial works and it's a
perfectly valid alternative (if not bloody obviously superior!) to the
"everything is characters in some knowable and consistent encoding"
approach on Windows.

-- 
Mathematics is the supreme nostalgia of our time.