[PATCH] py3: have an utility function to return string

Yuya Nishihara yuya at tcha.org
Fri Sep 16 09:46:47 EDT 2016


On Thu, 15 Sep 2016 23:59:59 +0530, Pulkit Goyal wrote:
> On Thu, Sep 15, 2016 at 7:06 PM, Yuya Nishihara <yuya at tcha.org> wrote:
> > On Wed, 14 Sep 2016 22:45:27 +0530, Pulkit Goyal wrote:
> >> # HG changeset patch
> >> # User Pulkit Goyal <7895pulkit at gmail.com>
> >> # Date 1473787789 -19800
> >> #      Tue Sep 13 22:59:49 2016 +0530
> >> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
> >> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
> >> py3: have an utility function to return string
> >>
> >> There are cases when we need strings and can't use bytes in python 3.
> >> We need an utility function for these cases. I agree that this may not
> >> be the best possible way out. I will be happy if anybody else can suggest
> >> a better approach. We need this functions for os.path.join(),
> >
> > We should stick to bytes for filesystem API, and translate bytes to unicode
> > at VFS layer as necessary.
> >
> > https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
> >
> > (Also, we'll have to disable PEP 528 and 529 on Python 3.6, which will break
> > existing repositories.)
> >
> > https://docs.python.org/3.6/whatsnew/3.6.html
> >
> >> __slots__
> >
> > __slots__ can be considered private data, so just use u''.
> >
> >> and few more things.
> >
> > for instance?
> This function was motivated from Gregory's reply to
> https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-August/086704.html
> , unfortunately I see that he replied to me only so I pasted it here
> https://bpaste.net/show/ab0d3ea39749
> 
> I am going through python documentation and there are things like
> __slots__, is_frozen() which accepts str in both py2 and py3. Since
> they are not same, I made this function to get help in such cases. If
> we can use unicodes in __slots__ in py2, than thats good.

Python 2.6-2.7 accepts both str and unicode in general, but mixing them is
disaster so we've never used unicode whenever possible. Unfortunately, Python 3
solved that problem by forcing us to use unicode (named str) everywhere, which
doesn't work in Mercurial because we need to process binary data (including
unix paths) transparently. All inputs and outputs (except for future Windows
file API) should be bytes.

So, if is_frozen() of Py3 doesn't take bytes and Py2 doesn't take unicode,
we'll need a compatibility function like you proposed.

> >> +# This function converts its arguments to strings
> >> +# on the basis of python version. Strings in python 3
> >> +# are unicodes and our transformer converts everything to bytes
> >> +# in python 3. So we need to decode it to unicodes in
> >> +# py3.
> >> +
> >> +def coverttostr(word):
> >> +    if sys.version_info[0] < 3:
> >> +        assert isinstance(word, str), "Not a string in Python 2"
> >> +        return word
> >> +    # Checking word is bytes because we have the transformer, else
> >> +    # raising error
> >> +    assert isinstance(word, bytes), "Should be bytes because of transformer"
> >> +    return word.decode(sys.getfilesystemencoding())
> >
> > Can we assume 'word' was encoded in file-system codec?
> 
> Yeah because of the tranformer, we added b'' everywhere.

As Martijn said, that varies on how 'word' was encoded. Python sources would
be latin1 or utf-8 in most cases, but a string read from external world is
different. We assume it as encoding.encoding.


More information about the Mercurial-devel mailing list