xml style doesn't generate valid xml
Matt Mackall
mpm at selenic.com
Tue Nov 23 13:05:09 CST 2010
On Tue, 2010-11-23 at 16:57 +0000, Haszlakiewicz, Eric wrote:
> >-----Original Message-----
> >From: Matt Mackall [mailto:mpm at selenic.com]
> >On Mon, 2010-11-22 at 23:31 +0000, Haszlakiewicz, Eric wrote:
> >> It seems like xmlescape isn't escaping everything it needs to. I'll
> >> bet you'll run into the same problem in other places, such as
> >> filenames, log messages, etc... anywhere where you could have bytes
> >> that aren't necessarily utf8 encoded
> >
> >You raise an interesting point here. By the time a commit message
> >reaches xmlescape, it's already been converted from the UTF-8 we stored
> >it in to the local encoding. And if that encoding isn't UTF-8,
> >converting it back to UTF-8 will be lossy.
>
> Oh, commit messages have to be in UTF-8? I didn't realize that. Sure
> enough, trying to enter arbitrary binary data in a commit message
> results in a "codec can't decode byte" error from mercurial. That
> seems quite sensible. (re-encoding back and forth, not so much)
>
> The same restriction does not appear to apply to filenames.
Indeed. See here: http://mercurial.selenic.com/wiki/EncodingStrategy
> However, for the issue with xmlescape turning things into spaces, I
> think that's because there's an explicit line of code in xmlescape
> that does that! In templatefilters.py, the last line of xmlescape is:
> return re.sub('[\x00-\x08\x0B\x0C\x0E-\x1F]', ' ', text)
Ugh, who ordered that.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list