Proposed strategy to port Mercurial to Python 3

Victor Stinner victor.stinner at haypocalc.com
Wed Nov 2 16:25:43 CDT 2011


Hi,

I'm trying to port Mercurial to Python 3. It started a port using 2to3 
and Python 3.3 to see if it is possible or not.

Because Mercurial still supports Python 2.4, the b'...' (bytes string 
literal) syntax can be used. I don't want to maintain a Mercurial fork, 
so I propose to mark byte strings using a function like six.b() to have 
something like:

    text = b('bytes')   # instead of b'bytes'

The six may be used as an external library (with a local copy if 
needed), or the library can be easily be copied, or a new compatiblity 
library can be written.

http://packages.python.org/six/

Because I don't know Mercurial, it's difficult to understand if bytes or 
Unicode type should be used. If I understood correctly, the following 
types should be used:

  * (changeset) description: Unicode
  * changeset data (revlog.revision): bytes
  * filename: Unicode
  * revision (node identifier): bytes
  * username: Unicode

"Unicode" means that the "unicode" Python 2 type should be used instead 
of "str" (bytes). It would simplify the code by avoiding transcoding 
(tolocal, fromlocal).

Thanks to the PEP 383, the Python 3 has a special error handler to avoid 
decoding error: "surrogateescape" error handler stores invalid bytes as 
surrogates. Python 2 doesn't have this error handler, so I'm not sure 
that we can use Unicode for filename with Python 2. But Unicode can be 
used for the description and username.

I don't understand if mdiff and bdiff should process bytes or Unicode. I 
would prefer bytes because I don't know how to guess the encoding of an 
arbitrary file...

For the serialization, Unicode (username, description, filenames, etc.) 
should be encoded to UTF-8. It looks like filenames are stored in the 
locale encoding.  For backward compatibility, a new feature should be 
added to .hg/requires (e.g.  "utf8") and an (atomic) process to convert 
the whole repository should be written.

Some files should be read a text:

  * "requires"
  * "cache/branchheads"

--

See also: http://mercurial.selenic.com/wiki/Py3kPort

--

Port Mercurial to Python 3 is an huge project. It should be done step by 
step.

Victor


More information about the Mercurial-devel mailing list