Bug 3467 - Unicode error when finding diff tool in TortoiseHg
Summary: Unicode error when finding diff tool in TortoiseHg
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: earlier
Hardware: PC Windows
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-21 06:54 UTC by Aleksey
Modified: 2012-06-19 00:01 UTC (History)
4 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksey 2012-05-21 06:54 UTC
I want to see visual diff by double clicking on file in tortoise workbench.
This operation emit dialog with Bug Report.
I've been reported this to TortoiseHG Bug Tracker and response is "the exception is being thrown from Mercurial, this needs to be caught better there. please report to Mercurial's BTS"

Help, please.

TortoiseHG Bug Report link:
http://bitbucket.org/tortoisehg/thg/issue/1710/unicodedecodeerror-ascii-codec-cant-decode
Comment 1 Matt Mackall 2012-05-21 09:25 UTC
Here's the traceback:

** Mercurial version (2.1).  TortoiseHg version (2.3)
** Command: 
** CWD: F:\Program Files\TortoiseHg
** Encoding: cp1251
** Extensions loaded: hgsubversion, extdiff
** Python version: 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)]
** Windows version: (5, 1, 2600, 2, 'Service Pack 3')
** Processor architecture: x86
** Qt-4.7.4 PyQt-4.8.6
Traceback (most recent call last):
  File "tortoisehg\hgqt\revdetails.pyo", line 420, in onDoubleClick
  File "tortoisehg\hgqt\revdetails.pyo", line 314, in vdiff
  File "tortoisehg\hgqt\visdiff.pyo", line 223, in visualdiff
  File "tortoisehg\util\hglib.pyo", line 492, in difftools
  File "tortoisehg\util\hglib.pyo", line 442, in mergetools
  File "mercurial\filemerge.pyo", line 32, in _findtool
  File "mercurial\win32.pyo", line 294, in lookupreg
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 3: ordinal not in range(128)
Comment 2 Matt Mackall 2012-05-21 09:40 UTC
The relevant code is:

            res = _advapi32.RegQueryValueExA(kh.value, valname, None,
                                       byref(type), buf, byref(size))
...
                return encoding.tolocal(buf.value.encode('UTF-8'))

..which is wrong, because it takes an ANSI byte string (aka local encoding), treats it as ASCII, attempts to decode it to UTF-16, re-encode it as UTF-8, then convert it back to the current ANSI code page.

This broke here, when we quietly switched from QueryValueEx to RegQueryValueExA:

http://www.selenic.com/hg/diff/f1fa8f481c7c/mercurial/win32.py#l1.250

..which means our registry lookup code has been unable to handle non-ASCII pathnames since 1.8. Fix queued for 2.2.2.
Comment 3 Adrian Buehlmann 2012-05-21 10:24 UTC
Another silliness is that I've needlessly replaced _winreg, which is a Python standard module, with direct calls to the windows API functions.

The error is, that I've assumed that _winreg would be part of the pywin32 package.

Replacing _winreg seems not justified.
Comment 4 Adrian Buehlmann 2012-05-21 16:28 UTC
diff --git a/mercurial/win32.py b/mercurial/win32.py
--- a/mercurial/win32.py
+++ b/mercurial/win32.py
@@ -290,8 +290,7 @@
             if res != _ERROR_SUCCESS:
                 continue
             if type.value == _REG_SZ:
-                # never let a Unicode string escape into the wild
-                return encoding.tolocal(buf.value.encode('UTF-8'))
+                return buf.value
             elif type.value == _REG_DWORD:
                 fmt = '<L'
                 s = ctypes.string_at(byref(buf), struct.calcsize(fmt))

Seems good enough to me (as briefly discussed with mpm on IRC). Going back to the _winreg-based implementation of lookupreg is too much churn for stable.
Comment 5 Adrian Buehlmann 2012-05-26 06:29 UTC
Fixed with http://selenic.com/repo/hg/rev/133a7922a900

changeset:   16759:133a7922a900
branch:      stable
parent:      16748:0a730d3c5aae
user:        Matt Mackall <mpm@selenic.com>
date:        Mon May 21 16:32:49 2012 -0500
summary:     win32: fix encoding handling for registry strings (issue3467)
Comment 6 Aleksey 2012-06-19 00:01 UTC
Everything works fine.

Thanks a lot!