[PATCH 4 of 8] encode all output in stdio encoding

Matt Mackall mpm at selenic.com
Mon Nov 20 13:15:54 CST 2006


On Mon, Nov 20, 2006 at 04:34:20PM -0200, Alexis S. L. Carvalho wrote:
> Thus spake Alexis S. L. Carvalho:
> > Thus spake Andrey:
> > > I should better have written something like 'encode all output in stdio 
> > > encoding, if not already encoded' in commit message. :) That ui.ui.encode() 
> > > function leaves all non-Unicode strings untouched, so hg cat works as 
> > > expected.
> > > 
> > 
> > It prints a traceback with hg log --patch with a revision that changes
> > the encoding of a file.
> 
> Hmm...  ok, it doesn't even get to ui.write - the current log code puts
> all strings in a list and does a ui.write("".join(strings)).  This
> patchset changes some of these strings from str's to unicode's, and so
> the "".join() raises an exception when it fails to convert the patch to
> a unicode.

This is a great example of why having a mix of Unicode and regular
strings in an app travelling the same paths is generally Not A Good
Idea. Especially as one of our primary concerns as an SCM is to pass
all data through the system unmangled.

Regular strings never throw exceptions. Functions that were written to
work on regular strings will explode in unexpected places when passed
unicode strings. That's bad. And retrofitting code to accept both is
complicated. 

Especially given that we generally _don't_ know the encoding of the
data we're manipulating. As far as I know, Unicode doesn't have an
encoding that says "I don't know what this is, it might be binary for
all I know, don't complain, and when you encode it back to 8-bit, it
must be exactly identical."

Going the other way, manipulations on regular encoded strings will
generally work. Operations that fail are things like upper(), lower(),
grep with mismatched encodings, and truncation that happens to chop
inside a character. And their failure modes are relatively harmless.
For instance, about the only significant user of lower is log -k,
which will continue to work roughly as advertised.

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial-devel mailing list