[issue2162] BOM (byte order mark) support for Mercurial.ini

Yuya Nishihara yuya at tcha.org
Tue Apr 27 10:53:27 CDT 2010


Mads Kiilerich wrote:
> Alexander Belchenko wrote, On 04/27/2010 04:15 PM:
> > Yuya Nishihara пишет:
> >> New submission from Yuya Nishihara <yuya at tcha.org>:
> >>
> >> Some text editors, like Notepad.exe, insert BOM (byte order mark) 
> >> silently if you save Mercurial.ini as UTF-8.
> >>
> >> IMHO, they shouldn't insert BOM for UTF-8, but it's really hard to 
> >> debug because BOM isn't visible. So it seems reasonable to 
> >> skip/recognize BOM before reading Mercurial.ini.
> >
> > I was under impression that UTF-8 might have optional BOM marker, and 
> > Python even has this constant defined:
> >
> > In [1]: import codecs
> >
> > In [2]: codecs.BOM
> > codecs.BOM          codecs.BOM_BE       codecs.BOM_UTF32
> > codecs.BOM32_BE     codecs.BOM_LE       codecs.BOM_UTF32_BE
> > codecs.BOM32_LE     codecs.BOM_UTF16    codecs.BOM_UTF32_LE
> > codecs.BOM64_BE     codecs.BOM_UTF16_BE codecs.BOM_UTF8
> > codecs.BOM64_LE     codecs.BOM_UTF16_LE
> >
> > In [2]: codecs.BOM_UTF8
> > Out[2]: '\xef\xbb\xbf'
> >
> > So, why you say it "shouldn't"?
> 
> Because it is optional, has no benefit, and "never" is used?

I heard it can be used for detection of character encoding,
but it seems silly to lose ascii compatibility just for such reason.
UTF-8 does exist for ascii transparency.

> Mercurial is not particular encoding-aware but very 
> encoding-transparent. Encoding Mercurial.ini in any ascii-superset is 
> fine, and BOMs could probably be removed or ignored when parsed, but in 
> that case the BOM should probably be prepended to all value strings too 
> ... and that would cause other strange issues.
> 
> FWIW I'm -0 on special handling of BOM - but a strip on the config file 
> content before parsing should do no harm.
> 
> Perhaps we could warn if any non-7-bit characters if found before the 
> first # or =?

That seems good for me. Stripping BOM is simple enough, but because
Mercurial doesn't care about encoding, warning comes after.

Yuya,


More information about the Mercurial-devel mailing list