[issue1975] UTF-16 text-files are treated as binary-files

Boris bugs at mercurial.selenic.com
Thu Jan 7 20:12:01 UTC 2010


New submission from Boris <50f80b736bf9bcbf281d61569daa1fe51230d0f6 at nurfuerspam.de>:

Today a had to include an UTF-16 encoded into a Mercurial repository. Sadly
it was not recognized as text, but treated as a binary.

------------------------------------
Steps to reproduce:
------------------------------------
$ hg init test
$ cd test
$ echo -en "\xFF\xFE\xe4\x00\x0a\x00" > temp # create an UTF-16 file with BOM
$ iconv -f utf16 < temp # this is only to check that it's valid UTF-16
ä
$ hg add temp
$ hg diff
diff -r 000000000000 temp
Binary file temp has changed
------------------------------------

I don't know if this is considered a bug (as I don't know if this is
intended), but I think this is a feature that's worth implementing.

I guess it's the null-bytes that will make hg think it's a binary as a UTF-8
file with BOM is accepted.
A possible implementation could check for a valid BOM at the beginning of a
file to determine the type.

----------
messages: 11407
nosy: e3a2955
priority: feature
status: unread
title: UTF-16 text-files are treated as binary-files

____________________________________________________
Mercurial issue tracker <bugs at mercurial.selenic.com>
<http://mercurial.selenic.com/bts/issue1975>
____________________________________________________


More information about the Mercurial-devel mailing list