xml style doesn't generate valid xml

Tue Nov 23 15:19:34 CST 2010

>-----Original Message-----
>From: Matt Mackall [mailto:mpm at selenic.com]
>
>> However, for the issue with xmlescape turning things into spaces, I
>> think that's because there's an explicit line of code in xmlescape
>> that does that!  In templatefilters.py, the last line of xmlescape is:
>>     return re.sub('[\x00-\x08\x0B\x0C\x0E-\x1F]', ' ', text)
>
>Ugh, who ordered that.
>

Well, I poked around a bit and came up with this as a better implementation of xmlescape.  What do you think:

def xmlescape(text):
    text = (text
            .replace('&', '&amp;')
            .replace('<', '&lt;')
            .replace('>', '&gt;')
            .replace('"', '&quot;')
            .replace("'", '&#39;')) # &apos; invalid in HTML
    moretoencode = True
    new_s = ""
    while moretoencode:
        try:
            text.decode("UTF-8", "strict")
            new_s += text
            moretoencode = False
        except UnicodeDecodeError, inst:
            preerror = text[0:inst.start]
            new_s += preerror
            escaped = "&#" + "%d" % ord(text[inst.start]) + ";"
            new_s += escaped
            text = text[inst.start+1:]
    def fixupcontrols(matchobj):
        return "&#" + "%d" % ord(matchobj.group(0)) + ";"
    return re.sub('[\x00-\x08\x0B\x0C\x0E-\x1F]', fixupcontrols, new_s)

eric