xml style doesn't generate valid xml

Haszlakiewicz, Eric EHASZLA at transunion.com
Wed Nov 24 13:17:55 CST 2010


>-----Original Message-----
>From: Matt Mackall [mailto:mpm at selenic.com]
>
>On Tue, 2010-11-23 at 23:51 +0000, Haszlakiewicz, Eric wrote:
>> >-----Original Message-----
>> >From: Matt Mackall [mailto:mpm at selenic.com]
>> >
>> >The thing is: no one knows what's in the extra field. It could be
>> >binary, it could be ASCII, it could be UTF-8, it could even be a mix.
>>
>> huh?  Since the output I got for the extra field didn't change when I
>> switched my locale around I figured that it was being treated as a
>> binary field.  How does mercurial decide whether to treat it as binary
>> or ASCII or UTF-8?
>
>extra["transplant_whatever"] -> binary
>extra["branch"] -> UTF-8.
>
>The log code doesn't know anything about that and assumes a) everything
>is potentially binary and b) it's ok to print it anyway, because that's
>what the user asked for.

   Except from what I can tell, mercurial does not treat extra["branch"] as UTF-8.  Regardless of what I set the encoding to, the output of hg log has the _same_ value for extra["branch"] and there is no conversion to the local encoding that happens.

hg init x && cd x
xchar=$(perl -e 'printf "%c", 205;')
HGENCODING=cp1251 hg branch "branch${xchar}"
HGENCODING=cp1251 hg ci -m "log msg ${xchar}"
HGENCODING=cp1251 hg log --style xml --debug > log1
HGENCODING=utf8 hg log --style xml --debug > log2
diff log1 log2

There are differences in the "msg" element, and the "branch" element, but not in the <extra key="branch"> element.
In other words, the "extra" elements, regardless of which specific key within extra you are talking about, are a low level, un-encoded view of the information and it isn't appropriate to use fromlocal() to fiddle with it.
msg and branch, on the other hand, do go through the convert-to-local-encoding step, so for xml output in utf8 it *is* appropriate to convert it back to utf8.

eric


More information about the Mercurial mailing list