[PATCH 07 of 10] py3: handle unicode docstrings in registrar.py

Wed Aug 3 10:16:05 EDT 2016

On Wed, 03 Aug 2016 01:57:29 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit at gmail.com>
> # Date 1470168548 -19800
> #      Wed Aug 03 01:39:08 2016 +0530
> # Node ID a77a7f6e8bfc90901d829257a782ff11e2bae0f7
> # Parent  da4a0ba184d3eff2819d73884770d342edce88c1
> py3: handle unicode docstrings in registrar.py
> 
> The module importer on Python 3 doesn't rewrite docstrings to bytes
> literals. So we need to teach document formatting to convert the
> documentation to bytes before formatting to ensure consistent
> types are used and the result in bytes.
> 
> diff -r da4a0ba184d3 -r a77a7f6e8bfc mercurial/registrar.py
> --- a/mercurial/registrar.py	Wed Aug 03 01:33:29 2016 +0530
> +++ b/mercurial/registrar.py	Wed Aug 03 01:39:08 2016 +0530
> @@ -83,6 +83,10 @@
>  
>          'doc' is '__doc__.strip()' of the registered function.
>          """
> +        # docstrings are using the source file encoding, which should be
> +        # utf-8.
> +        if not isinstance(doc, bytes):
> +            doc = doc.encode(u'utf-8')
>          return self._docformat % (decl, doc)

I think the conversion should be made where __doc__ is accessed. Doing that
at random places would be a source of bugs.