UTF-16 in Mercurial

Benoît Allard benoit at aeteurope.nl
Tue Mar 2 08:04:26 CST 2010


Hi there,

I've been experimenting on Windows with some UTF-16 (so called UNICODE 
under Windows) config files (registry export to be complete) and the 
attached -very- little extension that tries to make UTF-16 (or UTF-32) 
seen as text (not binary).

It has the drawback of generating non consistent patches: the body of 
the patch being in the encoding of the file, and the metadata (@@, +++, 
...) being in ANSI.

In one word, it's a dead end. Let me explain:

My first tests (@home) on my Mac were quite promising, patch was looking 
good, GNU patch was happy with it.

On my Windows station (WinXP) hg diff throws garbage to the terminal, 
regardless of the fact if you are using cygwin or the genuine cmd.exe, 
the terminal shows the first lines of the patch and unexpectedly stops 
displaying at some point giving the hand back to bash (or whatever 
interpreter windows is running).

I've not been able to test GNU patch on windows not having it installed 
on my system, but hg import, although applying the diff without 
complaining, did a completely different operation than the one the diff 
was about (other part of the file modified).

About TortoiseHG, it seems to be that it is displaying line by line or 
chunk by chunk the diff, depending on the view you are having, but it is 
consistently stopping at the first <NUL> byte: first of the line, or 
first of the chunk. Thus not displaying any interesting information.

I guess we need (at least I do) a solution to handle UTF-16 files in our 
diffs (export, terminal, thg, ...). So at this point, I'm asking if 
anyone has an idea on how we could proceed.

As a first step, I could turn myself toward the thg people, but I think 
this would be something the whole Mercurial community could benefit from.

Regards,
Benoît
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BOM.py
Type: text/x-python
Size: 426 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100302/e32e38d6/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 6031 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100302/e32e38d6/attachment.bin>


More information about the Mercurial-devel mailing list