The command "hg convert" on a darcs repository fails when the log messages in that repository contain ISO8859-15 encoded strings. This is probably also the case for other encodings that are not UTF-8. I have tested this with mercurial 1.6.3 on a fedora12 system. Here is a receipt to reproduce the problem (needs darcs): mkdir test cd test darcs init echo "first version" > file darcs add file echo "Überraschung" | iconv -f utf-8 -t iso8859-15 > log darcs record --logfile log -A "johndoe@somecompany" -a cd .. hg convert test A possible solution is added as a patch file here. This patch has to be applied to /usr/lib/python2.6/site-packages/hgext/convert/darcs.py In my solution, the encoding of log messages of the source repository can be given as an environment variable. So HG_CONVERTER_ENCODING=iso8859-15 hg convert test works with the darcs repo mentioned in the paragraph above. Maybe this or another similar solution can be considered in the next release of mercurial ?
Ok, crew-stable failed with the traceback below, so the current fix ( http://hg.intevation.org/mercurial/crew/rev/4481f8a93c7a ) is apparently not good enough. I haven't looked at the problem yet. $ hg convert test assuming destination test-hg initializing destination test-hg repository ** unknown exception encountered, details follow ** report bug details to http://mercurial.selenic.com/bts/ ** or mercurial@selenic.com ** Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] ** Mercurial Distributed SCM (version 1.6.3+61-1c9bb7e00f71) ** Extensions loaded: hgsubversion, rebase, purge, record, parentrevspec, convert, patchbomb, transplant, mq, mbox, bookmarks, highlight, extdiff, graphlog, progress, histedit Traceback (most recent call last): File "/Users/pmezard/bin/hg", line 27, in <module> mercurial.dispatch.run() File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 16, in run sys.exit(dispatch(sys.argv[1:])) File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 34, in dispatch return _runcatch(u, args) File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 54, in _runcatch return _dispatch(ui, args) File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 494, in _dispatch cmdpats, cmdoptions) File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 355, in runcommand ret = _runcommand(ui, options, cmd, d) File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 545, in _runcommand return checkargs() File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 499, in checkargs return cmdfunc() File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 492, in <lambda> d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/Users/pmezard/lib/python/mercurial/util.py", line 420, in check return func(*args, **kwargs) File "/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/__init__.py", line 254, in convert return convcmd.convert(ui, src, dest, revmapfile, **opts) File "/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/convcmd.py", line 429, in convert c.convert(sortmode) File "/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/convcmd.py", line 335, in convert self.source.before() File "/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/darcs.py", line 71, in before repodir=self.path) File "/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/darcs.py", line 96, in xml etree.parse(fp) File "<string>", line 32, in parse SyntaxError: not well-formed (invalid token): line 3, column 7
Right; 4481f8 added a comment that darcs generates XML-ish output but doesn't specify any encoding and includes the raw encoded text in the "XML". The XML parser can't handle that unless we tell it which encodnig to use. XML defaults to utf-8, so as tests/test-convert-darcs.t shows we don't (no longer) have a problem there. The provided patch looks fine, but * see http://mercurial.selenic.com/wiki/ContributingChanges * the patch must be updated to head of default branch where the issue has been partially fixed * space after , * juse use XMLParser(encoding=encoding.encoding) in all cases * use HGENCODING=iso8859-15 when converting * make sure this case is tested by tests/test-convert-darcs.t Encoding handling during conversion could generally be beefed up, but consistency across the various sources is also important.
I've posted a more conservative patch for this to mercurial-devel: http://markmail.org/message/7f53m6hytsbyzit7 I think being able to specify the source encoding is a good idea, but probably something we should do in 1.7, not on stable.
Fixed by 1f6bd49383b3 in stable.
Sorry but the bug is NOT fixed in 1f6bd49383b3 in stable. When I fetch hg clone http://www.selenic.com/repo/hg-stable then cd hg-stable && hg update tip then build: python setup.py install --home=$HOME/zzz/hg-test then change PATH and PYTHONPATH so I have: hg --version Mercurial Distributed SCM (version 1.6.4+3-f314723f36f5) Now I build my test darcs repo: mkdir test cd test darcs init echo "first version" > file darcs add file echo "Überraschung" | iconv -f utf-8 -t iso8859-15 > log darcs record --logfile log -A "johndoe@somecompany" -a cd .. Now try to convert: hg convert test I get this: assuming destination test-hg initializing destination test-hg repository ** unknown exception encountered, details follow ** report bug details to http://mercurial.selenic.com/bts/ ** or mercurial@selenic.com ** Python 2.6.2 (r262:71600, Jun 4 2010, 18:28:04) [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] ** Mercurial Distributed SCM (version 1.6.4+3-f314723f36f5) ** Extensions loaded: fetch, hgk, extdiff, transplant, graphlog, rebase, mq, convert, record, hgview Traceback (most recent call last): File "/home/pfeiffer/zzz/hg-test/bin/hg", line 27, in <module> mercurial.dispatch.run() File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 16, in run sys.exit(dispatch(sys.argv[1:])) File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 34, in dispatch return _runcatch(u, args) File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 54, in _runcatch return _dispatch(ui, args) File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 494, in _dispatch cmdpats, cmdoptions) File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 355, in runcommand ret = _runcommand(ui, options, cmd, d) File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 545, in _runcommand return checkargs() File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 499, in checkargs return cmdfunc() File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line 492, in <lambda> d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/util.py", line 420, in check return func(*args, **kwargs) File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/__init__.py", line 254, in convert return convcmd.convert(ui, src, dest, revmapfile, **opts) File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/convcmd.py", line 429, in convert c.convert(sortmode) File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/convcmd.py", line 335, in convert self.source.before() File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/darcs.py", line 71, in before repodir=self.path) File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/darcs.py", line 96, in xml etree.parse(fp) File "<string>", line 32, in parse SyntaxError: not well-formed (invalid token): line 3, column 7 The attached file "HGPATCH" shows a way to fix this. The xml parsing function etree.parse needs to be given the encoding of the XML source.
--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:13 EDT --- This bug was previously known as _bug_ 2411 at http://mercurial.selenic.com/bts/issue2411 Imported an attachment (id=1461)