I'm trying to convert a Darcs repo to a mercurial one. When doing this, I get the following error: http://paste.lisp.org/display/114070 My system is in utf-8: $ locale LANG=es_AR.UTF-8 LC_CTYPE="es_AR.UTF-8" LC_NUMERIC="es_AR.UTF-8" LC_TIME="es_AR.UTF-8" LC_COLLATE="es_AR.UTF-8" LC_MONETARY="es_AR.UTF-8" LC_MESSAGES="es_AR.UTF-8" LC_PAPER="es_AR.UTF-8" LC_NAME="es_AR.UTF-8" LC_ADDRESS="es_AR.UTF-8" LC_TELEPHONE="es_AR.UTF-8" LC_MEASUREMENT="es_AR.UTF-8" LC_IDENTIFICATION="es_AR.UTF-8" LC_ALL= The offending patch seems to have a character not UTF-8. If I use darcs changes to see the patch description, y see... "Saque el tama\c3\b1o en [..]" where the \c3\b1 is the problematic character. This patch was writing by someone using windows, so it is most likely in latin1. I tried adding LANG=es_AR.ISO-8859-1 and HGENCODING=latin-1 to the convert line with the same output. Thanks!
$ hg convert usbtinyisp2/ programador/ scanning source... sorting... converting... 9 Saque el tamaño en la funcion SPI,ahora viene en la trama de configure. NO ANDA source: 20080716021925-6d91c-632461a7d8c5c5462f7bc19fa66dd86c88b37b1a.gz spi/main.c transaction abort! rollback completed ** unknown exception encountered, details follow ** report bug details to http://mercurial.selenic.com/bts/ ** or mercurial@selenic.com ** Python 2.6.6rc1+ (r266rc1:83691, Aug 5 2010, 17:07:04) [GCC 4.4.5 20100728 (prerelease)] ** Mercurial Distributed SCM (version 1.6.2) ** Extensions loaded: convert Traceback (most recent call last): File "/usr/bin/hg", line 27, in <module> mercurial.dispatch.run() File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 16, in run sys.exit(dispatch(sys.argv[1:])) File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 34, in dispatch return _runcatch(u, args) File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 54, in _runcatch return _dispatch(ui, args) File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 490, in _dispatch cmdpats, cmdoptions) File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 351, in runcommand ret = _runcommand(ui, options, cmd, d) File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 541, in _runcommand return checkargs() File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 495, in checkargs return cmdfunc() File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 488, in <lambda> d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 420, in check return func(*args, **kwargs) File "/usr/lib/pymodules/python2.6/hgext/convert/__init__.py", line 243, in convert return convcmd.convert(ui, src, dest, revmapfile, **opts) File "/usr/lib/pymodules/python2.6/hgext/convert/convcmd.py", line 429, in convert c.convert(sortmode) File "/usr/lib/pymodules/python2.6/hgext/convert/convcmd.py", line 359, in convert self.copy(c) File "/usr/lib/pymodules/python2.6/hgext/convert/convcmd.py", line 328, in copy source, self.map) File "/usr/lib/pymodules/python2.6/hgext/convert/hg.py", line 171, in putcommit self.repo.commitctx(ctx) File "/usr/lib/pymodules/python2.6/mercurial/localrepo.py", line 966, in commitctx user, ctx.date(), ctx.extra().copy()) File "/usr/lib/pymodules/python2.6/mercurial/changelog.py", line 215, in add user, desc = encoding.fromlocal(user), encoding.fromlocal(desc) File "/usr/lib/pymodules/python2.6/mercurial/encoding.py", line 63, in fromlocal return s.decode(encoding, encodingmode).encode("utf-8") File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 13: ordinal not in range(128)
"\xc3\xb1" is the utf-8 encoding of u'\xf1' (~n), so i don't think there is any latin-1 issue here. The failure happens while converting u'\xf1' to utf-8, but it is strange that the ascii encoder gives an error message even though we are using the utf-8 encoder. Do this work for you: python -c 'print repr(u"\xf1".encode("utf-8"))' ?
Hi! The python command works: $ python -c 'print repr(u"\xf1".encode("utf-8"))' '\xc3\xb1' Thanks
I'm attaching a test repo that fails. It's a fresh repo with just one dummy changeset that has "Saque el tamaño" as the commit message. I get the same error with ee601a6264e0 on Python 2.6.5/Ubuntu 10.04. Steps I took to create the repo: $ mkdir foo $ cd foo $ darcs init $ echo a > a $ darcs add a $ darcs record -m 'Saque el tamaño' Then to convert: $ cd .. $ hg convert foo foo-hg My LANG was set to en_US.UTF-8 when I ran those commands.
It looks like xml.etree.ElementTree.XMLParser by default assumes all input is UTF-8 (or something similar), and wherever it returns text from the document it tries to encode that text into ASCII. If that fails, it returns unicode objects. So that commit message with "~n" gets passed into changelog.add() as a unicode object, and it blows up trying to do encoding.fromlocal(). Another thing to keep in mind is that the XML changelog from darcs is what's in each patch verbatim; there's no consistent encoding, despite it being XML. etree will raise SyntaxError for data that isn't valid UTF-8 from what I can tell.
This seems to fix it: --- a/hgext/convert/darcs.py +++ b/hgext/convert/darcs.py @@ -108,7 +108,7 @@ date = util.strdate(elt.get('local_date'), '%a %b %d %H:%M:%S %Z %Y') desc = elt.findtext('name') + '\n' + elt.findtext('comment', '') return commit(author=elt.get('author'), date=util.datestr(date), - desc=desc.strip(), parents=self.parents[rev]) + desc=self.recode(desc.strip()), parents=self.parents[rev]) def pull(self, rev): output, status = self.run('pull', self.path, all=True, (I hadn't seen Brodies last mail, so there might be some duplicate work here ...)
Fixed by http://hg.intevation.org/mercurial/crew/rev/4481f8a93c7a Brodie Rao <brodie@bitheap.org> convert/darcs: handle non-ASCII metadata in darcs changelog (issue2354)
--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:12 EDT --- This bug was previously known as _bug_ 2354 at http://mercurial.selenic.com/bts/issue2354 Imported an attachment (id=1451)