[PATCH] encoding: only call locale.setlocale() when truly necessary

Dan Villiom Podlaski Christiansen danchr at gmail.com
Fri Aug 13 12:47:36 CDT 2010


# HG changeset patch
# User Dan Villiom Podlaski Christiansen <danchr at gmail.com>
# Date 1281721544 -7200
# Branch stable
# Node ID 3eb8e1ace2880329b217ef953c2b0f67e25275eb
# Parent  8276e08b9621bcac1bc0d041d6572045d6749f9d
encoding: only call locale.setlocale() when truly necessary.

Calling locale.setlocale() has some unfortunate side-effects when
using Python 2.6 and earlier on Mac OS X. Specifically, the following
code will fail:

>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, "en_US.UTF-8")
>>> s = "\xc3\x86\xc3\x98\xc3\x85"
>>> assert s == s.rstrip()

The cause of this is the last byte --- part of a decomposed A-ring ---
being incorrectly stripped. Actually fixing this is hard, so instead,
we avoid calling setlocale() unless it's strictly necessary. This will
apply to the vast majority of users, as it includes the Python
binaries included with recent versions of Mac OS X.

So, to sum up: This change doesn't fix the bug, as such, but it makes
it affect fewer users than it would previously.

diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -12,10 +12,15 @@ _encodingfixup = {'646': 'ascii', 'ANSI_
 
 try:
     encoding = os.environ.get("HGENCODING")
-    if sys.platform == 'darwin' and not encoding:
+    if (sys.platform == 'darwin' and not encoding
+        and locale.getpreferredencoding() == 'mac-roman'):
         # On darwin, getpreferredencoding ignores the locale environment and
         # always returns mac-roman. We override this if the environment is
         # not C (has been customized by the user).
+        # Some Python distributions have been patched to disable this behavior,
+        # so we first check if getpreferredencoding really does return
+        # mac-roman. Notably, the default Python binaries in recent versions of
+        # Mac OS X include such a patch.
         lc = locale.setlocale(locale.LC_CTYPE, '')
         if lc == 'UTF-8':
             locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')


More information about the Mercurial-devel mailing list