[issue2956] hg convert and non-ascii characters in file names

jtn bugs at mercurial.selenic.com
Sun Aug 14 00:23:01 CDT 2011


New submission from jtn <jtn at Safe-mail.net>:

"hg convert -s svn -d hg" incorrectly converts non-ascii characters in file
names to utf-8 even on systems that don't support utf-8 encoding.

Example (bash)

$ unset LANG LC_ALL
$ export LANG=pl_PL.ISO-8859-2 LC_MESSAGES=C
$ mkdir sample && cd sample
$ filename=`printf '\xb1\xe6\xea\xb3\xf1\xf3\xb6\xbc\xbf'`
$ echo test > "$filename"
$ ls | od -tx1
0000000 b1 e6 ea b3 f1 f3 b6 bc bf 0a
0000012

This file can be added to an svn or hg repository and will be properly
handled by subversion, "hg clone" and even "hg convert -s hg -d hg".
However, "hg convert -s svn -d hg" creates a repository containing something
else:

$ ls | od -tx1
0000000 c4 85 c4 87 c4 99 c5 82 c5 84 c3 b3 c5 9b c5 ba
0000020 c5 bc 0a
0000023

It's utf-8, file name can be "decoded" with iconv:

$ ls | iconv -f UTF8 -t ISO_8859-2 | od -tx1
0000000 b1 e6 ea b3 f1 f3 b6 bc bf 0a
0000012

I believe this behavior is incorrect.

(Tested with Mercurial 1.9.1 and 1.9.1+9-2ef2d3a5cd2d.)

----------
messages: 17133
nosy: jtn
priority: bug
status: unread
title: hg convert and non-ascii characters in file names

____________________________________________________
Mercurial issue tracker <bugs at mercurial.selenic.com>
<http://mercurial.selenic.com/bts/issue2956>
____________________________________________________


More information about the Mercurial-devel mailing list