Bug 2411 - hg convert for darcs fails with ISO8859-15 encoded log messages
Summary: hg convert for darcs fails with ISO8859-15 encoded log messages
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All All
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-01 04:05 UTC by Goetz Pfeiffer
Modified: 2012-05-13 05:03 UTC (History)
5 users (show)

See Also:
Python Version: ---


Attachments
(34 bytes, application/octet-stream)
2010-10-01 04:05 UTC, Goetz Pfeiffer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Goetz Pfeiffer 2010-10-01 04:05 UTC
The command "hg convert" on a darcs repository fails when the log messages
in that repository contain ISO8859-15 encoded strings. This is probably also
the case for other encodings that are not UTF-8.

I have tested this with mercurial 1.6.3 on a fedora12 system.

Here is a receipt to reproduce the problem (needs darcs):
mkdir test
cd test 
darcs init
echo "first version" > file
darcs add file
echo "Überraschung" | iconv -f utf-8 -t iso8859-15 > log
darcs record --logfile log -A "johndoe@somecompany" -a
cd ..
hg convert test

A possible solution is added as a patch file here. This patch has to be 
applied to
/usr/lib/python2.6/site-packages/hgext/convert/darcs.py

In my solution, the encoding of log messages of the source repository 
can be given as an environment variable. So 

HG_CONVERTER_ENCODING=iso8859-15 hg convert test

works with the darcs repo mentioned in the paragraph above.

Maybe this or another similar solution can be considered in the next
release of mercurial ?
Comment 1 Patrick Mézard 2010-10-01 05:12 UTC
Ok, crew-stable failed with the traceback below, so the current fix (
http://hg.intevation.org/mercurial/crew/rev/4481f8a93c7a ) is apparently not
good enough. I haven't looked at the problem yet.


$ hg convert test
assuming destination test-hg
initializing destination test-hg repository
** unknown exception encountered, details follow
** report bug details to http://mercurial.selenic.com/bts/
** or mercurial@selenic.com
** Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc.
build 5646)]
** Mercurial Distributed SCM (version 1.6.3+61-1c9bb7e00f71)
** Extensions loaded: hgsubversion, rebase, purge, record, parentrevspec,
convert, patchbomb, transplant, mq, mbox, bookmarks, highlight, extdiff,
graphlog, progress, histedit
Traceback (most recent call last):
  File "/Users/pmezard/bin/hg", line 27, in <module>
    mercurial.dispatch.run()
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 16, in run
    sys.exit(dispatch(sys.argv[1:]))
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 34, in dispatch
    return _runcatch(u, args)
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 54, in _runcatch
    return _dispatch(ui, args)
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 494, in _dispatch
    cmdpats, cmdoptions)
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 355, in
runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 545, in
_runcommand
    return checkargs()
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 499, in checkargs
    return cmdfunc()
  File "/Users/pmezard/lib/python/mercurial/dispatch.py", line 492, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/Users/pmezard/lib/python/mercurial/util.py", line 420, in check
    return func(*args, **kwargs)
  File
"/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/__init__.py", line
254, in convert
    return convcmd.convert(ui, src, dest, revmapfile, **opts)
  File
"/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/convcmd.py", line
429, in convert
    c.convert(sortmode)
  File
"/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/convcmd.py", line
335, in convert
    self.source.before()
  File "/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/darcs.py",
line 71, in before
    repodir=self.path)
  File "/Users/pmezard/dev/mercurial/hg/hg-pmezard/hgext/convert/darcs.py",
line 96, in xml
    etree.parse(fp)
  File "<string>", line 32, in parse
SyntaxError: not well-formed (invalid token): line 3, column 7
Comment 2 kiilerix 2010-10-01 07:26 UTC
Right; 4481f8 added a comment that darcs generates XML-ish output but
doesn't specify any encoding and includes the raw encoded text in the "XML".
The XML parser can't handle that unless we tell it which encodnig to use.

XML defaults to utf-8, so as tests/test-convert-darcs.t shows we don't (no
longer) have a problem there.

The provided patch looks fine, but
* see http://mercurial.selenic.com/wiki/ContributingChanges
* the patch must be updated to head of default branch where the issue has
been partially fixed
* space after ,
* juse use XMLParser(encoding=encoding.encoding) in all cases
* use HGENCODING=iso8859-15 when converting
* make sure this case is tested by tests/test-convert-darcs.t

Encoding handling during conversion could generally be beefed up, but
consistency across the various sources is also important.
Comment 3 brodie 2010-10-01 09:25 UTC
I've posted a more conservative patch for this to mercurial-devel: 
http://markmail.org/message/7f53m6hytsbyzit7

I think being able to specify the source encoding is a good idea, but probably 
something we should do in 1.7, not on stable.
Comment 4 Matt Mackall 2010-10-11 14:32 UTC
Fixed by 1f6bd49383b3 in stable.
Comment 5 Goetz Pfeiffer 2010-10-12 03:23 UTC
Sorry but the bug is NOT fixed in 1f6bd49383b3 in stable.

When I fetch
hg clone http://www.selenic.com/repo/hg-stable
then
cd hg-stable && hg update tip

then build:
python setup.py install --home=$HOME/zzz/hg-test

then change PATH and PYTHONPATH so I have:
hg --version
Mercurial Distributed SCM (version 1.6.4+3-f314723f36f5)

Now I build my test darcs repo:
mkdir test
cd test 
darcs init
echo "first version" > file
darcs add file
echo "Überraschung" | iconv -f utf-8 -t iso8859-15 > log
darcs record --logfile log -A "johndoe@somecompany" -a
cd ..

Now try to convert:
hg convert test

I get this:
assuming destination test-hg
initializing destination test-hg repository
** unknown exception encountered, details follow
** report bug details to http://mercurial.selenic.com/bts/
** or mercurial@selenic.com
** Python 2.6.2 (r262:71600, Jun  4 2010, 18:28:04) [GCC 4.4.3 20100127 (Red
Hat 4.4.3-4)]
** Mercurial Distributed SCM (version 1.6.4+3-f314723f36f5)
** Extensions loaded: fetch, hgk, extdiff, transplant, graphlog, rebase, mq,
convert, record, hgview
Traceback (most recent call last):
  File "/home/pfeiffer/zzz/hg-test/bin/hg", line 27, in <module>
    mercurial.dispatch.run()
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
16, in run
    sys.exit(dispatch(sys.argv[1:]))
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
34, in dispatch
    return _runcatch(u, args)
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
54, in _runcatch
    return _dispatch(ui, args)
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
494, in _dispatch
    cmdpats, cmdoptions)
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
355, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
545, in _runcommand
    return checkargs()
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
499, in checkargs
    return cmdfunc()
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/dispatch.py", line
492, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/home/pfeiffer/zzz/hg-test/lib/python/mercurial/util.py", line 420,
in check
    return func(*args, **kwargs)
  File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/__init__.py",
line 254, in convert
    return convcmd.convert(ui, src, dest, revmapfile, **opts)
  File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/convcmd.py",
line 429, in convert
    c.convert(sortmode)
  File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/convcmd.py",
line 335, in convert
    self.source.before()
  File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/darcs.py", line
71, in before
    repodir=self.path)
  File "/home/pfeiffer/zzz/hg-test/lib/python/hgext/convert/darcs.py", line
96, in xml
    etree.parse(fp)
  File "<string>", line 32, in parse
SyntaxError: not well-formed (invalid token): line 3, column 7

The attached file "HGPATCH" shows a way to fix this. The xml parsing
function etree.parse needs to be given the encoding of the XML source.
Comment 6 Bugzilla 2012-05-12 09:13 UTC

--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:13 EDT  ---

This bug was previously known as _bug_ 2411 at http://mercurial.selenic.com/bts/issue2411
Imported an attachment (id=1461)