UTF-8 Byte order marks inserted by hg merge

James Talbut James.Talbut at omg3d.com
Tue Jul 1 02:45:17 CDT 2008


Brian,

Just out of interest why don't you want the BOM there?
You may want the files to be 7 bit only, but you are telling your XML
parser that they are UTF-8.
If a non-ascii character is put in the file by accident a text editor
that is unaware of the xml declaration could save it in an unparsable
encoding.

I'm just trying to solicit views on BOMs.

Thanks.

Jim

> -----Original Message-----
> From: mercurial-bounces at selenic.com 
> [mailto:mercurial-bounces at selenic.com] On Behalf Of Brian Wallis
> Sent: 01 July 2008 05:51
> To: Mercurial List
> Subject: Re: UTF-8 Byte order marks inserted by hg merge
> 
> 
> On 30/06/2008, at 10:50 PM, Adrian Buehlmann wrote:
> > Known use case with notepad:
> >
> > Create a file containing the following byte sequence: c2 a9 0d 0a
> >
> > Open that file with notepad and add an additional empty line by 
> > hitting return (i.e. another 0d 0a). Then save. File now 
> starts with 
> > the BOM.
> >
> >> The file is not UTF-8 in particular, it is simple ASCII, 7 
> bit clean, 
> >> an XML file.
> >
> > And it doesn't happen to start with
> >
> > <?xml version="1.0" encoding="utf-8" ?>
> 
> Yes, it turns out that the files in question did start with 
> that. At first I thought there were more but that was a 
> different problem, it ended up with three xml files (eclipse 
> .classpath) being affected.
> 
> The merge tool was the culprit. Our windows users are using 
> Beyond Compare version 3 beta and it is adding the BOMs to the files.
> 
> As this is a problem that will not go away and the BOM in a 
> UTF-8 file is just noise, I am going to use a slightly 
> modified version of the win32text extension to detect and 
> filter out these three bytes. This will ensure that our 
> repository never has them committed to it. I will use the 
> dumbencode/decode so I can control exactly which files are 
> converted rather than trust it to never touch a binary file. 
> I will also put a forbid hook on our main repository to 
> disallow files with BOMs from being pushed.
> 
> Thanks for the help on resolving this.
> 
> regards,
> Brian Wallis
> InfoMedix
> p: 3 8615 4553 | f: 3 8615 4501 | e: 
> brian.wallis at infomedix.com.au Level 5, 451 Little Bourke 
> Street, Melbourne VIC 3000
> 
> 
> 
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial
> 

________________________________________________________________________
This e-mail, and any attachment, is confidential. If you have received it in error, do not use or disclose the information in any way, notify me immediately, and please delete it from your system.
________________________________________________________________________



More information about the Mercurial mailing list