[PATCH] Don't consider UTF-16 and UTF-32 files as binary (issue1975) (version2 -- resend)

Martin Geisler mg at lazybytes.net
Mon Mar 1 15:30:23 CST 2010


Martin Geisler <mg at lazybytes.net> writes:

> Benoît Allard <benoit at aeteurope.nl> writes:
>
>> Ben wrote:
>>> The generated patches works fine with GNU diff. I mainly needs that to
>>> review my commits on Windows where some tools generate config files
>>> automatically in UTF-16 ...
>>>
>>> # HG changeset patch
>>> # User Benoit Allard <benoit at aeteurope.nl>
>>> # Date 1265306735 -3600
>>> Don't consider UTF-16 and UTF-32 files as binary (issue1975)
>
> I'm looking at the patch and will push it later tonight.

I looked some more and I think this needs more consideration.

By changing util.binary to accept UTF-16 and UTF-32 encoded files, we
will begin producing output from 'hg diff' that is a mixture of ASCII
and UTF-16/32. We don't know how the windows command prompt reacts to
this garbled output.

GNU patch is okay with the extra NUL bytes -- it seems to split on
newlines only. But if there are extra LF bytes in the UTF-16/32 encoded
string, then we'll split on those too.

We talked about it on IRC and I think a tiny extension that overrides
util.binary is best here. If it turns out to be a great idea, then
perhaps we can include it in the core later.

-- 
Martin Geisler

Fast and powerful revision control: http://mercurial.selenic.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100301/33f10551/attachment.pgp>


More information about the Mercurial-devel mailing list