Consequences for use of hg for other applications than SCM was Re: German umlauts in file names
Matt Mackall
mpm at selenic.com
Fri Jun 20 12:43:23 CDT 2008
On Fri, 2008-06-20 at 17:43 +0200, Marko Kaening wrote:
> Hi (Matt),
>
> In case of umlaut-containing file names Mercurial or TortoiseHg does NOT
> set a file name like SVN or TortoiseSVN would do, in case the file
> originates from systems using different charsets. The file name is not
> adapted to the current charset, how SVN would do it.
>
> Having in mind what Matt wrote earlier in this thread it looks as if this
> behaviour is acutally wanted behaviour and not a bug:
> ==============================
> <cite Matt>
> >
> > Mercurial by design does absolutely no encoding on filenames, as
> > filenames very often have to byte-for-byte agree with their
> > representation in other files such as makefiles, etc.
> >
> <cite/>
>
> BUT, I believe that it is not what the user really wants in some cases.
Users want a lot of things they don't fully understand the implications
of.
> echo "umlauts added in utf-8 on linux box: öäü" > file-öäü.txt
This is a perfect example of one of the pitfalls of encoding. There are
no umlauts in the above in any standard encoding:
00001460: 2275 6d6c 6175 7473 2061 6464 6564 2069 "umlauts added i
00001470: 6e20 7574 662d 3820 6f6e 206c 696e 7578 n utf-8 on linux
00001480: 2062 6f78 3a20 c383 c2b6 c383 c2a4 c383 box: ..........
00001490: c2bc 2220 3e20 6669 6c65 2dc3 83c2 b6c3 .." > file-.....
000014a0: 83c2 a4c3 83c2 bc2e 7478 740a 0a73 766e ........txt..svn
Because your editor and your mail client tried to be smart about
encoding and/or were misconfigured, the original -bytes- of your message
are now lost.
> As you can see, Mercurial or TortoiseHg does NOT set file name like
> TortoiseSVN would do. The file name is not adapted to the current charset,
> how SVN would do it.
>
> That's what I mean here. I THINK THAT'S INCONSISTENT, BECAUSE NOT
> PORTABLE.
>
> Up to now I haven't figured out, which parameter for --encoding or
> HGENCODING I should use to make it work. My console seems to be set to
> cp850, the system might be set to cp1252, if I understood right that the
> following regkey is the one to believe:
Neither of those will have any effect: Mercurial does not encode
filenames. What comes out is the same as what goes in.
You either need to set your Windows machine to use UTF-8 or set your
Linux machine to use something roughly cp850-compatible like Latin1.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial
mailing list