[PATCH] mercurial ignores setlocale and uses ascii instead of utf8

Yuya Nishihara yuya at tcha.org
Sun Oct 30 06:33:15 EDT 2016


(please CC the list)

On Sat, 29 Oct 2016 21:04:14 +0300, Eugene Maslovich wrote:
> There is some repo with several commits. Each commit has a description
> containing unicode letters.
> I'm trying to output a detailed log using mercurial as a library via Flask.
> 
> If do this:
> 
> ---------------------------------------------------------
> # -*- coding: utf-8 -*-
> import locale
> from flask import Flask
> from mercurial import ui, hg, commands
> app = Flask(__name__)
> @app.route('/')
> def hello():
>     hgUi = ui.ui()
>     output = ''
>     hgRepo = hg.repository(hgUi, '/var/www/hg/repos/web/test1/')
>     hgUi.pushbuffer()
>     commands.log(hgUi, hgRepo, verbose=True, stat=True)
>     output += hgUi.popbuffer()
>     return output
> ---------------------------------------------------------
> 
> Then I will receive something like this:
> 
> ---------------------------------------------------------
> changeset:   100:7b8e1b1d9714
> branch:      outgoing
> tag:         tip
> user:        ehpc
> date:        Sat Oct 29 10:33:50 2016 +0300
> summary:     ??????? ?? ???????
> ---------------------------------------------------------

For this problem, you can simply set encoding.encoding = 'UTF-8'.

Perhaps, the recommended way would be to use hglib to isolate Mercurial
process, but that might require more work for your flask app to manage the
hg process.

https://www.mercurial-scm.org/wiki/PythonHglib

> All unicode characters are replaced with "?".
> Well that may be ok. But then I do this:
> 
> ---------------------------------------------------------
> # -*- coding: utf-8 -*-
> import locale
> locale.setlocale(locale.LC_ALL, 'ru_RU.utf8')
> locale.setlocale(locale.LC_CTYPE, 'ru_RU.utf8')
> from flask import Flask
> from mercurial import ui, hg, commands
> app = Flask(__name__)
> @app.route('/')
> def hello():
>     hgUi = ui.ui()
>     output = ''
>     hgRepo = hg.repository(hgUi, '/var/www/hg/repos/web/test1/')
>     hgUi.pushbuffer()
>     commands.log(hgUi, hgRepo, verbose=True, stat=True)
>     output += hgUi.popbuffer()
>     return output
> ---------------------------------------------------------
> 
> I still receive "?".
> 
> That was kind of unclear because I looked at encoding.py:
> 
> ---------------------------------------------------------
> try:
>     encoding = environ.get("HGENCODING")
>     if not encoding:
>         encoding = locale.getpreferredencoding() or 'ascii'
>         encoding = _encodingfixers.get(encoding, lambda: encoding)()
> except locale.Error:
>     encoding = 'ascii'
> encodingmode = environ.get("HGENCODINGMODE", "strict")
> fallbackencoding = 'ISO-8859-1'
> ---------------------------------------------------------
> 
> "encoding" was still "ascii". It was unexpected.
> 
> There is a solution to the problem: setting environment variable
> HGENCODING to "UTF-8".
> But it is kind of unclear why settings the actual locale doesn't
> result in proper symbols instead of "?".


More information about the Mercurial-devel mailing list