[PATCH] encoding: improve handling of buggy getpreferredencoding() on Mac OS X
Brodie Rao
brodie at bitheap.org
Sun Aug 15 00:30:30 CDT 2010
On Aug 13, 2010, at 7:31 PM, Dan Villiom Podlaski Christiansen wrote:
> # HG changeset patch
> # User Dan Villiom Podlaski Christiansen <danchr at gmail.com>
> # Date 1281742254 -7200
> # Node ID ec4a47b5db217863cd175e1e0aea51e4d7c17444
> # Parent 22cef2ef536169892e9a9c4380b5f0f9c17dbf39
> encoding: improve handling of buggy getpreferredencoding() on Mac OS X
>
> Prior to version 2.7, calling locale.getpreferredencoding() would
> always return 'mac-roman' on Mac OS X. Previously, this was handled by
> a call to locale.setlocale(). Unfortunately, Python 2.6.5 and older
> have a bug where isspace() would incorrectly report True for 0x85 and
> 0xa0 after such a call.
>
> In order to fix this, we replace the previous _encodingfixup mapping
> to an _encodingfixers mapping. Rather than mapping encodings to their
> replacement, it maps them to a function returning the
> replacement. This allows us to provide an simplified implementation of
> getpreferredencoding() which extracts the expected encoding and
> restores the locale.
>
> This fix is based on a patch originally submitted by Martijn Pieters
> as well as feedback from Brodie Rao.
Looks good. I think this is the best approach to take. All tests pass
with Python 2.4.6, 2.5.5, 2.6.5, and 2.7 on OS X 10.6.4.
Martijn, can you confirm that Dan's patch works for you?
> diff --git a/mercurial/encoding.py b/mercurial/encoding.py
> --- a/mercurial/encoding.py
> +++ b/mercurial/encoding.py
> @@ -8,21 +8,41 @@
> import error
> import sys, unicodedata, locale, os
>
> -_encodingfixup = {'646': 'ascii', 'ANSI_X3.4-1968': 'ascii'}
> +def _getpreferredencoding():
> + '''
> + On darwin, getpreferredencoding ignores the locale environment
> and
> + always returns mac-roman. http://bugs.python.org/issue6202
> fixes this
> + for Python 2.7 and up. This is the same corrected code for
> earlier
> + Python versions.
> +
> + However, we can't use a version check for this method, as some
> distributions
> + patch Python to fix this. Instead, we use it as a 'fixer' for
> the mac-roman
> + encoding, as it is unlikely that this encoding is the actually
> expected.
> + '''
> + try:
> + locale.CODESET
> + except AttributeError:
> + # Fall back to parsing environment variables :-(
> + return locale.getdefaultlocale()[1]
> +
> + oldloc = locale.setlocale(locale.LC_CTYPE)
> + locale.setlocale(locale.LC_CTYPE, "")
> + result = locale.nl_langinfo(locale.CODESET)
> + locale.setlocale(locale.LC_CTYPE, oldloc)
> +
> + return result
> +
> +_encodingfixers = {
> + '646': lambda: 'ascii',
> + 'ANSI_X3.4-1968': lambda: 'ascii',
> + 'mac-roman': _getpreferredencoding
> +}
>
> try:
> encoding = os.environ.get("HGENCODING")
> - if sys.platform == 'darwin' and not encoding:
> - # On darwin, getpreferredencoding ignores the locale
> environment and
> - # always returns mac-roman. We override this if the
> environment is
> - # not C (has been customized by the user).
> - lc = locale.setlocale(locale.LC_CTYPE, '')
> - if lc == 'UTF-8':
> - locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')
> - encoding = locale.getlocale()[1]
> if not encoding:
> encoding = locale.getpreferredencoding() or 'ascii'
> - encoding = _encodingfixup.get(encoding, encoding)
> + encoding = _encodingfixers.get(encoding, lambda: encoding)()
> except locale.Error:
> encoding = 'ascii'
> encodingmode = os.environ.get("HGENCODINGMODE", "strict")
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list