German umlauts in file names

Hans Meine meine at informatik.uni-hamburg.de
Thu Jun 19 07:07:18 CDT 2008


Am Donnerstag, 19. Juni 2008 13:49:18 schrieb Marko Kaening:
> How do I find out which charset hg or THg is actually using?

In my experience, there is often no such thing as "the charset XYZ uses" 
nowadays.  Think of it more like which charset your file content/name is 
encoded at, and every application on every platform must agree on that.

In your case,
a) what is the encoding of the SVN repo's content?
b) Is it the same whether the SVN repo is created on Linux or Windows?
c) What is the encoding of the converted hg repo's content?
d) Is it the same whether you ran "hg convert" on Linux or Windows?

If all repos use e.g. UTF-8 (which would be very sane), the only thing to be 
checked is whether strings provided via the UI are correctly transformed 
to/from that charset.

I don't know about THg, but hg is a python program, for which the encoding 
used for filenames is sys.getfilesystemencoding(), stdin has a potentially 
different sys.stdin.encoding (likewise, every other file), and I bet it is 
possible to have an external editor for commit messages which uses a 
different encoding than hg expects.  Under Linux, in the transition phase 
between distribution versions based on latin1 and the current, mostly 
utf8-based releases, it was not uncommon that e.g. your shell expects a 
different encoding than your terminal delivers, which leads to all kinds of 
problems that could also affect hg of course (substitute "shell" for "hg" 
above).

In other words, I think nobody except you would be able to debug your specific 
problem.

-- 
Ciao, /  /
     /--/
    /  / ANS


More information about the Mercurial mailing list