Bug 3233 - convert (from bzr) crashes with non-ascii characters in commit log
Summary: convert (from bzr) crashes with non-ascii characters in commit log
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All All
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-31 09:24 UTC by A. Budden
Modified: 2012-05-13 04:54 UTC (History)
3 users (show)

See Also:
Python Version: ---


Attachments
(34 bytes, text/x-diff)
2012-01-31 11:44 UTC, Patrick Mézard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description A. Budden 2012-01-31 09:24 UTC
As title...

To reproduce (in case this doesn't paste properly, the bit in the brackets 
is character 176, which I produced by opening Windows charmap and copying/
pasting).

$ mkdir ascii_issue_bzr
$ cd ascii_issue_bzr
$ bzr init
$ echo "ABC" > abc
$ bzr add
$ bzr ci -m "This is a commit message with a degree sign in it (°)."
$ cd ..
$ hg convert ascii_issue_bzr ascii_issue_hg
initializing destination ascii_issue_hg repository
scanning source...
sorting...
converting...
0 This is a commit message with a degree sign in it (ï°).
transaction abort!
rollback completed
** unknown exception encountered, please report by visiting
**  http://mercurial.selenic.com/wiki/BugTracker
** Python 2.6.5 (r265:79063, Jun 12 2010, 17:07:01) [GCC 4.3.4 20090804 
(release) 1]
** Mercurial Distributed SCM (version 1.9.2)
** Extensions loaded: color, graphlog, progress, convert, extdiff, purge, 
record, fetch, schemes, hgk, rebase
Traceback (most recent call last):
  File "/usr/bin/hg", line 38, in <module>
    mercurial.dispatch.run()
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 27, 
in run
    sys.exit(dispatch(request(sys.argv[1:])))
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 64, 
in dispatch
    return _runcatch(req)
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 87, 
in _runcatch
    return _dispatch(req)
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 688, 
in _dispatch
    cmdpats, cmdoptions)
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 463, 
in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/python2.6/site-packages/mercurial/extensions.py", line 
182, in wrap
    return wrapper(origfn, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/hgext/color.py", line 368, in 
colorcmd
    return orig(ui_, opts, cmd, cmdfunc)
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 742, 
in _runcommand
    return checkargs()
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 696, 
in checkargs
    return cmdfunc()
  File "/usr/lib/python2.6/site-packages/mercurial/dispatch.py", line 685, 
in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/python2.6/site-packages/mercurial/util.py", line 385, in 
check
    return func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/hgext/convert/__init__.py", line 
269, in convert
    return convcmd.convert(ui, src, dest, revmapfile, **opts)
  File "/usr/lib/python2.6/site-packages/hgext/convert/convcmd.py", line 
445, in convert
    c.convert(sortmode)
  File "/usr/lib/python2.6/site-packages/hgext/convert/convcmd.py", line 
361, in convert
    self.copy(c)
  File "/usr/lib/python2.6/site-packages/hgext/convert/convcmd.py", line 
330, in copy
    source, self.map)
  File "/usr/lib/python2.6/site-packages/hgext/convert/hg.py", line 171, in 
putcommit
    self.repo.commitctx(ctx)
  File "/usr/lib/python2.6/site-packages/mercurial/localrepo.py", line 
1112, in commitctx
    user, ctx.date(), ctx.extra().copy())
  File "/usr/lib/python2.6/site-packages/mercurial/changelog.py", line 243, 
in add
    text = "\n".join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 51: 
ordinal not in range(128)
Comment 1 Patrick Mézard 2012-01-31 11:13 UTC
Your example does work for me on OSX with bzr 2.4.2 and a development
version of hg. Is your shell encoding configured for UTF-8? What is your
environment (I am a little puzzled about the Windows charmap reference with
/usr/lib mentions in the traceback).

That said, I think the encoding code in bzr.py is wrong.
Comment 2 Patrick Mézard 2012-01-31 11:44 UTC
Can you try the attached patch? It changes the encoding behaviour of bzr
source, making it expect unicode objects everywhere and trying to encode
them to UTF-8 before passing them to hg. I believe this is more correct.
Comment 3 A. Budden 2012-01-31 11:54 UTC
The environment for the test cases and in which I'm running Mercurial is
cygwin (hence the /lib paths) with a (modified) putty terminal running on
Windows XP.  UTF-8 usually works fine with this set-up.  The original branch
that caused me problems was almost certainly (I'm not sure as I wasn't the
committer) committed using the QBzr "qcommit" tool, which I believe also
uses UTF-8.

I've just tried the same script on my home PC (running Ubuntu) and it seems
to work okay there, so I guess it's something to do with cygwin/Windows
(isn't everything?!).  Unfortunately, I don't have Windows on any of my home
PCs, so I won't be able to test this again until tomorrow (~8am GMT).
Comment 4 A. Budden 2012-01-31 11:59 UTC
No problem, I'll try the patch first thing tomorrow morning.
Comment 5 A. Budden 2012-02-01 02:25 UTC
The patch appears to have fixed the problem.  It didn't apply cleanly 
(presumably by Cygwin version of Mercurial is a little old: there is no 
'seen.add(path or topath)' in my bzr.py), but I applied the middle change 
manually and it seems to work fine.
Comment 6 Patrick Mézard 2012-02-01 03:52 UTC
Well, you are probably missing this fix with 1.9.2:

  http://hg.intevation.org/mercurial/crew/rev/6ba2fc0a87ab
Comment 7 HG Bot 2012-02-03 16:00 UTC
Fixed by http://selenic.com/repo/hg/rev/f5b6046f6ce8
Patrick Mezard <pmezard@gmail.com>
convert/bzr: expect unicode metadata, encode in UTF-8 (issue3232)

(please test the fix)
Comment 8 Bugzilla 2012-05-12 09:27 UTC

--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:27 EDT  ---

This bug was previously known as _bug_ 3232 at http://mercurial.selenic.com/bts/issue3232
Imported an attachment (id=1624)