[PATCH] encoding: improve handling of buggy getpreferredencoding() on Mac OS X

Brodie Rao brodie at bitheap.org
Sun Aug 15 00:30:30 CDT 2010


On Aug 13, 2010, at 7:31 PM, Dan Villiom Podlaski Christiansen wrote:

> # HG changeset patch
> # User Dan Villiom Podlaski Christiansen <danchr at gmail.com>
> # Date 1281742254 -7200
> # Node ID ec4a47b5db217863cd175e1e0aea51e4d7c17444
> # Parent  22cef2ef536169892e9a9c4380b5f0f9c17dbf39
> encoding: improve handling of buggy getpreferredencoding() on Mac OS X
>
> Prior to version 2.7, calling locale.getpreferredencoding() would
> always return 'mac-roman' on Mac OS X. Previously, this was handled by
> a call to locale.setlocale(). Unfortunately, Python 2.6.5 and older
> have a bug where isspace() would incorrectly report True for 0x85 and
> 0xa0 after such a call.
>
> In order to fix this, we replace the previous _encodingfixup mapping
> to an _encodingfixers mapping. Rather than mapping encodings to their
> replacement, it maps them to a function returning the
> replacement. This allows us to provide an simplified implementation of
> getpreferredencoding() which extracts the expected encoding and
> restores the locale.
>
> This fix is based on a patch originally submitted by Martijn Pieters
> as well as feedback from Brodie Rao.

Looks good. I think this is the best approach to take. All tests pass  
with Python 2.4.6, 2.5.5, 2.6.5, and 2.7 on OS X 10.6.4.

Martijn, can you confirm that Dan's patch works for you?

> diff --git a/mercurial/encoding.py b/mercurial/encoding.py
> --- a/mercurial/encoding.py
> +++ b/mercurial/encoding.py
> @@ -8,21 +8,41 @@
> import error
> import sys, unicodedata, locale, os
>
> -_encodingfixup = {'646': 'ascii', 'ANSI_X3.4-1968': 'ascii'}
> +def _getpreferredencoding():
> +    '''
> +    On darwin, getpreferredencoding ignores the locale environment  
> and
> +    always returns mac-roman. http://bugs.python.org/issue6202  
> fixes this
> +    for Python 2.7 and up. This is the same corrected code for  
> earlier
> +    Python versions.
> +
> +    However, we can't use a version check for this method, as some  
> distributions
> +    patch Python to fix this. Instead, we use it as a 'fixer' for  
> the mac-roman
> +    encoding, as it is unlikely that this encoding is the actually  
> expected.
> +    '''
> +    try:
> +        locale.CODESET
> +    except AttributeError:
> +        # Fall back to parsing environment variables :-(
> +        return locale.getdefaultlocale()[1]
> +
> +    oldloc = locale.setlocale(locale.LC_CTYPE)
> +    locale.setlocale(locale.LC_CTYPE, "")
> +    result = locale.nl_langinfo(locale.CODESET)
> +    locale.setlocale(locale.LC_CTYPE, oldloc)
> +
> +    return result
> +
> +_encodingfixers = {
> +    '646': lambda: 'ascii',
> +    'ANSI_X3.4-1968': lambda: 'ascii',
> +    'mac-roman': _getpreferredencoding
> +}
>
> try:
>     encoding = os.environ.get("HGENCODING")
> -    if sys.platform == 'darwin' and not encoding:
> -        # On darwin, getpreferredencoding ignores the locale  
> environment and
> -        # always returns mac-roman. We override this if the  
> environment is
> -        # not C (has been customized by the user).
> -        lc = locale.setlocale(locale.LC_CTYPE, '')
> -        if lc == 'UTF-8':
> -            locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')
> -        encoding = locale.getlocale()[1]
>     if not encoding:
>         encoding = locale.getpreferredencoding() or 'ascii'
> -        encoding = _encodingfixup.get(encoding, encoding)
> +        encoding = _encodingfixers.get(encoding, lambda: encoding)()
> except locale.Error:
>     encoding = 'ascii'
> encodingmode = os.environ.get("HGENCODINGMODE", "strict")
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel



More information about the Mercurial-devel mailing list