Proposed strategy to port Mercurial to Python 3

Matt Mackall mpm at selenic.com
Wed Nov 2 17:08:43 CDT 2011


On Wed, 2011-11-02 at 22:25 +0100, Victor Stinner wrote:
> Hi,
> 
> I'm trying to port Mercurial to Python 3. It started a port using 2to3 
> and Python 3.3 to see if it is possible or not.
> 
> Because Mercurial still supports Python 2.4, the b'...' (bytes string 
> literal) syntax can be used. I don't want to maintain a Mercurial fork, 
> so I propose to mark byte strings using a function like six.b() to have 
> something like:
> 
>     text = b('bytes')   # instead of b'bytes'
> 
> The six may be used as an external library (with a local copy if 
> needed), or the library can be easily be copied, or a new compatiblity 
> library can be written.
> 
> http://packages.python.org/six/

Interesting, but probably not sufficient.. Python 3.x bytes objects are
crippled relative to Python 2.x str objects because they've taken away
some of the string-oriented methods like '%'. Also, b"a"[0] = "a" on Py2
and 97 on Py3. So just pretending Py2 str/bytes == Py3 bytes is not so
useful.

> Because I don't know Mercurial, it's difficult to understand if bytes or 
> Unicode type should be used.

The short answer is that Python Unicode objects (whether they're called
str or unicode by Python) are completely unwelcome in the bulk of the
Mercurial codebase. Please see:

http://mercurial.selenic.com/wiki/EncodingStrategy

There have been several projects in this area, including a GSoC project
last year, I suggest you read up on that.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list