[PATCH] mercurial ignores setlocale and uses ascii instead of utf8
Yuya Nishihara
yuya at tcha.org
Sat Oct 29 11:05:21 EDT 2016
On Sat, 29 Oct 2016 12:00:06 +0300, Eugene Maslovich wrote:
> # HG changeset patch
> # User ehpc <ehpc at ehpc.io>
> # Date 1477731491 -10800
> # Sat Oct 29 11:58:11 2016 +0300
> # Branch stable
> # Node ID 095b226e8b67b7dd866ecb867cb641fa478e78ba
> # Parent b9f7b0c10027764cee77f9c6d61877fcffea837f
> encoding: mercurial ignores setlocale and uses ascii instead of utf8
>
> locale.getpreferredencoding() internally uses
> locale.setlocale(locale.LC_CTYPE, '')
> so even if a user sets locale explicitly via
>
> locale.setlocale(locale.LC_ALL, 'ru_RU.utf8')
> locale.setlocale(locale.LC_CTYPE, 'ru_RU.utf8')
>
> mercurial still detects ascii.
Sounds like you're using Mercurial as a library. Can you elaborate what you're
trying to and the problem you got.
Also, we're in code freeze and this patch doesn't look like a bug fix. Please
send new patches after November 1st.
https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan
> diff -r b9f7b0c10027 -r 095b226e8b67 mercurial/encoding.py
> --- a/mercurial/encoding.py Wed Oct 19 18:06:14 2016 +0200
> +++ b/mercurial/encoding.py Sat Oct 29 11:58:11 2016 +0300
> @@ -93,7 +93,7 @@
> try:
> encoding = environ.get("HGENCODING")
> if not encoding:
> - encoding = locale.getpreferredencoding() or 'ascii'
> + encoding = locale.getpreferredencoding(False) or 'ascii'
This appears wrong. The doc says "on some systems, it is necessary to invoke
setlocale() to obtain the user preferences."
https://docs.python.org/2.7/library/locale.html#locale.getpreferredencoding
% LC_CTYPE=en_US.UTF-8 python -c 'import locale; print locale.getpreferredencoding()'
UTF-8
% LC_CTYPE=en_US.UTF-8 python -c 'import locale; print locale.getpreferredencoding(False)'
ANSI_X3.4-1968
% uname -a
Linux mimosa 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
> @@ -146,11 +146,14 @@
>
> try:
> try:
> + if encoding == 'UTF-8':
> + # fast path
> + if isinstance(s, unicode):
> + return s
> + else:
> + return s.decode('UTF-8')
What? tolocal() is the function to convert utf-8-encoded bytes to bytes of local
encoding. The input and output should never be unicodes.
More information about the Mercurial-devel
mailing list