[PATCH] mercurial ignores setlocale and uses ascii instead of utf8

Yuya Nishihara yuya at tcha.org
Sat Oct 29 11:05:21 EDT 2016


On Sat, 29 Oct 2016 12:00:06 +0300, Eugene Maslovich wrote:
> # HG changeset patch
> # User ehpc <ehpc at ehpc.io>
> # Date 1477731491 -10800
> #      Sat Oct 29 11:58:11 2016 +0300
> # Branch stable
> # Node ID 095b226e8b67b7dd866ecb867cb641fa478e78ba
> # Parent  b9f7b0c10027764cee77f9c6d61877fcffea837f
> encoding: mercurial ignores setlocale and uses ascii instead of utf8
> 
> locale.getpreferredencoding() internally uses
> locale.setlocale(locale.LC_CTYPE, '')
> so even if a user sets locale explicitly via
> 
> locale.setlocale(locale.LC_ALL, 'ru_RU.utf8')
> locale.setlocale(locale.LC_CTYPE, 'ru_RU.utf8')
> 
> mercurial still detects ascii.

Sounds like you're using Mercurial as a library. Can you elaborate what you're
trying to and the problem you got.

Also, we're in code freeze and this patch doesn't look like a bug fix. Please
send new patches after November 1st.

https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan

> diff -r b9f7b0c10027 -r 095b226e8b67 mercurial/encoding.py
> --- a/mercurial/encoding.py Wed Oct 19 18:06:14 2016 +0200
> +++ b/mercurial/encoding.py Sat Oct 29 11:58:11 2016 +0300
> @@ -93,7 +93,7 @@
>  try:
>      encoding = environ.get("HGENCODING")
>      if not encoding:
> -        encoding = locale.getpreferredencoding() or 'ascii'
> +        encoding = locale.getpreferredencoding(False) or 'ascii'

This appears wrong. The doc says "on some systems, it is necessary to invoke
setlocale() to obtain the user preferences."

https://docs.python.org/2.7/library/locale.html#locale.getpreferredencoding

% LC_CTYPE=en_US.UTF-8 python -c 'import locale; print locale.getpreferredencoding()'
UTF-8
% LC_CTYPE=en_US.UTF-8 python -c 'import locale; print locale.getpreferredencoding(False)'
ANSI_X3.4-1968
% uname -a
Linux mimosa 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux

> @@ -146,11 +146,14 @@
> 
>      try:
>          try:
> +            if encoding == 'UTF-8':
> +                # fast path
> +                if isinstance(s, unicode):
> +                    return s
> +                else:
> +                    return s.decode('UTF-8')

What? tolocal() is the function to convert utf-8-encoded bytes to bytes of local
encoding. The input and output should never be unicodes.


More information about the Mercurial-devel mailing list