[issue2162] BOM (byte order mark) support for Mercurial.ini

Alexander Belchenko bialix at ukr.net
Wed Apr 28 02:29:15 CDT 2010


Yuya Nishihara пишет:
> Mads Kiilerich wrote:
>> Alexander Belchenko wrote, On 04/27/2010 04:15 PM:
>>> Yuya Nishihara пишет:
>>>> New submission from Yuya Nishihara <yuya at tcha.org>:
>>>>
>>>> Some text editors, like Notepad.exe, insert BOM (byte order mark) 
>>>> silently if you save Mercurial.ini as UTF-8.
>>>>
>>>> IMHO, they shouldn't insert BOM for UTF-8, but it's really hard to 
>>>> debug because BOM isn't visible. So it seems reasonable to 
>>>> skip/recognize BOM before reading Mercurial.ini.
>>> I was under impression that UTF-8 might have optional BOM marker, and 
>>> Python even has this constant defined:
>>>
>>> In [1]: import codecs
>>>
>>> In [2]: codecs.BOM
>>> codecs.BOM          codecs.BOM_BE       codecs.BOM_UTF32
>>> codecs.BOM32_BE     codecs.BOM_LE       codecs.BOM_UTF32_BE
>>> codecs.BOM32_LE     codecs.BOM_UTF16    codecs.BOM_UTF32_LE
>>> codecs.BOM64_BE     codecs.BOM_UTF16_BE codecs.BOM_UTF8
>>> codecs.BOM64_LE     codecs.BOM_UTF16_LE
>>>
>>> In [2]: codecs.BOM_UTF8
>>> Out[2]: '\xef\xbb\xbf'
>>>
>>> So, why you say it "shouldn't"?
>> Because it is optional, has no benefit, and "never" is used?
> 
> I heard it can be used for detection of character encoding,
> but it seems silly to lose ascii compatibility just for such reason.
> UTF-8 does exist for ascii transparency.

I don't understand what is "ascii transparency" here. When somebody said 
about "ascii" seriously, for me it sounds the same as pretend we're 
living in the flat world which stand on the back of big turtle.



More information about the Mercurial-devel mailing list