UTF-8 Byte order marks inserted by hg merge

Adrian Buehlmann adrian at cadifra.com
Mon Jun 30 04:21:40 CDT 2008


On 30.06.2008 10:26, Brian Wallis wrote:
> First I should state that I am still investigating this problem and am  
> still a little unsure as to what happened.
> 
> We have a user on Linux (Suse 10.3) running Mercurial 1.0.1 and  
> another on Windows Vista running TortoiseHg 0.4 each of who were  
> working on some changes on a branch. When it came time to merge, the  
> user on Windows pulled the changes from the other repository and  
> merged the two heads. The merged result seemed to be slightly  
> corrupted in that there were three extra characters added to the front  
> of a few files. These were (in hex) EF BB BF which are the byte order  
> marker for UTF-8.
> 
> Two of the files in question were merged without conflict and the  
> third required a conflict resolution for which Beyond Compare 3 was  
> configured and used. These leads me to think that it was mercurial  
> that inserted the marks, not some other windows tool.
> 
> Looking at the two parent revisions of the files shows no Byte Order  
> Marks in the files. I am going to attempt to reproduce the problem  
> tomorrow in another similarly configured Windows Vista machine.
> 
> Any suggestions would be appreciated.

This shouldn't have anything to do with Mercurial. I bet this
was notepad.

I have a file here on Windows that we want to have to be stored
as UTF-8 *without* BOM (byte order mark).

I've been biten numerous times in the past (already long before we
switched to Mercurial) by notepad inserting such a BOM into that file.

My solution is to not use stupid notepad on that file.

WordPad seems to be better, I currently mostly use the free notepad++.

So don't use notepad to edit "UTF-8 files without BOM", or it will
insert a BOM behind your back.


More information about the Mercurial mailing list