Consequences for use of hg for other applications than SCM was Re: German umlauts in file names

Mads Kiilerich mads at kiilerich.com
Mon Jun 23 17:25:51 CDT 2008


Hans Meine wrote, On 06/23/2008 03:15 PM:
> As Matt wrote, hg does *not* use the unicode API (which is also available in 
> Python, see the link I posted above), but uses only 8-bit functions.  This 
> way, unicode filenames cannot be preserved.  IMO this qualifies as a bug - 
> OK, call it a documented, clean, but for certain users unexpected (and 
> undesired) behavior which cannot be changed.
>
> However, I think this should not be hard to fix for people like Marko.  

Isn't that almost and not entirely unlike what the win32mbcs extension 
does? It seems to be a few-liner to change it to a "win32utf8repo" 
extension which assumes that the repo uses utf-8 encoding and the 
filesystem uses raw unicode. win32mbcs seems to be a special case of 
that. I think that except for existing windows-only repos then utf-8 as 
repo encoding is a fair assumption.

Disclaimer: I don't know how well win32mbcs works, I haven't tried it, 
and I don't know if it will collide with the ongoing work for better 
support for windows filesystems.

/Mads


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3435 bytes
Desc: S/MIME Cryptographic Signature
Url : http://selenic.com/pipermail/mercurial/attachments/20080624/4734e80a/attachment.bin 


More information about the Mercurial mailing list