[issue2162] BOM (byte order mark) support for Mercurial.ini
Mads Kiilerich
mads at kiilerich.com
Tue Apr 27 10:04:59 CDT 2010
Alexander Belchenko wrote, On 04/27/2010 04:15 PM:
> Yuya Nishihara пишет:
>> New submission from Yuya Nishihara <yuya at tcha.org>:
>>
>> Some text editors, like Notepad.exe, insert BOM (byte order mark)
>> silently if you save Mercurial.ini as UTF-8.
>>
>> IMHO, they shouldn't insert BOM for UTF-8, but it's really hard to
>> debug because BOM isn't visible. So it seems reasonable to
>> skip/recognize BOM before reading Mercurial.ini.
>
> I was under impression that UTF-8 might have optional BOM marker, and
> Python even has this constant defined:
>
> In [1]: import codecs
>
> In [2]: codecs.BOM
> codecs.BOM codecs.BOM_BE codecs.BOM_UTF32
> codecs.BOM32_BE codecs.BOM_LE codecs.BOM_UTF32_BE
> codecs.BOM32_LE codecs.BOM_UTF16 codecs.BOM_UTF32_LE
> codecs.BOM64_BE codecs.BOM_UTF16_BE codecs.BOM_UTF8
> codecs.BOM64_LE codecs.BOM_UTF16_LE
>
> In [2]: codecs.BOM_UTF8
> Out[2]: '\xef\xbb\xbf'
>
> So, why you say it "shouldn't"?
Because it is optional, has no benefit, and "never" is used?
Mercurial is not particular encoding-aware but very
encoding-transparent. Encoding Mercurial.ini in any ascii-superset is
fine, and BOMs could probably be removed or ignored when parsed, but in
that case the BOM should probably be prepended to all value strings too
... and that would cause other strange issues.
FWIW I'm -0 on special handling of BOM - but a strip on the config file
content before parsing should do no harm.
Perhaps we could warn if any non-7-bit characters if found before the
first # or =?
/Mads
More information about the Mercurial-devel
mailing list