xml style doesn't generate valid xml

Haszlakiewicz, Eric EHASZLA at transunion.com
Mon Nov 22 17:31:48 CST 2010


>-----Original Message-----
>From: Matt Mackall [mailto:mpm at selenic.com]
>
>On Mon, 2010-11-22 at 22:49 +0000, Haszlakiewicz, Eric wrote:
>> I'm not going to paste the actual output into an email because it's
>> not plain text.  However, here's a sample of what "less log.out"
>> displays, minus the terminal dependent highlighting to distinguish
>> between specially displayed characters and actual angle brackets:
>> <extra
>key="transplant_source"><CD><B9><AA>Q<FF><A3>}<AC>JI<D5><8E><D8>zWL,<FB>;<F
>7></extra>
>
>Thanks. This tells me about 10 times more than your original message.
>Perhaps a CDATA section is appropriate here:
>
>diff -r 77aa74fe0e0b mercurial/templates/map-cmdline.xml
>--- a/mercurial/templates/map-cmdline.xml	Mon Nov 22 13:11:46 2010 -0600
>+++ b/mercurial/templates/map-cmdline.xml	Mon Nov 22 17:20:47 2010 -0600
>@@ -16,4 +16,4 @@
> parent = '<parent revision="{rev}" node="{node}" />\n'
> branch = '<branch>{branch|xmlescape}</branch>\n'
> tag = '<tag>{tag|xmlescape}</tag>\n'
>-extra = '<extra key="{key|xmlescape}">{value|xmlescape}</extra>\n'
>+extra = '<extra key="{key|xmlescape}"><![CDATA[{value}]]></extra>\n'

Yeah, that's not going to help.  There's nothing that prevents the value from having two square brackets in it and ending the CDATA section early.  Also, that doesn't do anything about the encoding issue.  A CDATA section indicates that the characters are not to be parsed by the xml parser, but they need to be valid characters in the first place.

It seems like xmlescape isn't escaping everything it needs to.  I'll bet you'll run into the same problem in other places, such as filenames, log messages, etc... anywhere where you could have bytes that aren't necessarily utf8 encoded.

eric


More information about the Mercurial mailing list