[PATCH] Don't consider UTF-16 and UTF-32 files as binary (issue1975) (version 2)

Mads Kiilerich mads at kiilerich.com
Mon Feb 8 09:20:48 CST 2010


On 02/08/2010 04:03 PM, Dirkjan Ochtman wrote:
> On Mon, Feb 8, 2010 at 15:43, Ollivier Robert<roberto at keltia.net>  wrote:
>> Hmmm, technically, you don't need a BOM in UTF-8 so checking for it seems wrong to me.
>
> I disagree. We want UTF-8 to not be treated as binary, so we want to
> check for any BOMs people might include, even if it's optional for
> UTF-8.

Sure. But "valid" UTF-8 does not contain any zero bytes and will thus 
never be considered binary anyway.

So the question is just how invalid UTF-8 files should be handled.

/Mads


More information about the Mercurial-devel mailing list