[PATCH] py3: have an utility function to return string

Pierre-Yves David pierre-yves.david at ens-lyon.org
Fri Sep 16 06:09:46 EDT 2016



On 09/15/2016 03:36 PM, Yuya Nishihara wrote:
> On Wed, 14 Sep 2016 22:45:27 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit at gmail.com>
>> # Date 1473787789 -19800
>> #      Tue Sep 13 22:59:49 2016 +0530
>> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
>> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
>> py3: have an utility function to return string
>>
>> There are cases when we need strings and can't use bytes in python 3.
>> We need an utility function for these cases. I agree that this may not
>> be the best possible way out. I will be happy if anybody else can suggest
>> a better approach. We need this functions for os.path.join(),
>
> We should stick to bytes for filesystem API, and translate bytes to unicode
> at VFS layer as necessary.
>
> https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
>
> (Also, we'll have to disable PEP 528 and 529 on Python 3.6, which will break
> existing repositories.)
>
> https://docs.python.org/3.6/whatsnew/3.6.html
>
>> __slots__
>
> __slots__ can be considered private data, so just use u''.
>
>> and few more things.
>
> for instance?
>
>> +# This function converts its arguments to strings
>> +# on the basis of python version. Strings in python 3
>> +# are unicodes and our transformer converts everything to bytes
>> +# in python 3. So we need to decode it to unicodes in
>> +# py3.
>> +
>> +def coverttostr(word):

Any reason, this comment is not the python docstring?

>> +    if sys.version_info[0] < 3:
>> +        assert isinstance(word, str), "Not a string in Python 2"
>> +        return word
>> +    # Checking word is bytes because we have the transformer, else
>> +    # raising error
>> +    assert isinstance(word, bytes), "Should be bytes because of transformer"
>> +    return word.decode(sys.getfilesystemencoding())
>
> Can we assume 'word' was encoded in file-system codec?

On what kind of string is this going to be used. If we intend to us this 
on Mercurial internal identifier only, we can probably assume (and 
actually, enforce) ascii to keep things simple.

Cheers,

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list