UTF-8 Byte order marks inserted by hg merge

Ian Lewis ianmlewis at gmail.com
Tue Jul 1 03:36:23 CDT 2008


James,

2008/7/1 James Talbut <James.Talbut at omg3d.com>:
> Brian,
>
> Just out of interest why don't you want the BOM there?
> You may want the files to be 7 bit only, but you are telling your XML
> parser that they are UTF-8.

BOMs are really annoying in a lot of ways. For instance with php files
with a BOM the php interpreter will emit the BOM because it comes
before the <?php processing instruction. There are so many programs
that just don't deal with BOMs properly so from experience they are
best avoided.

> If a non-ascii character is put in the file by accident a text editor
> that is unaware of the xml declaration could save it in an unparsable
> encoding.

Editors that don't deal with encoding issues properly will save your
file in an unparsable encoding no matter what. BOMs would only solve
the problem if your data was in UTF. You'd be out of luck in any other
encoding besides ascii. That's why things like the encoding portion of
xml declarations and special comments to set the file encoding in
python and ruby source files exist. BOM is much more an irritant than
any kind of solution.

Ian


More information about the Mercurial mailing list