Proposed strategy to port Mercurial to Python 3
Victor Stinner
victor.stinner at haypocalc.com
Wed Nov 2 16:25:43 CDT 2011
Hi,
I'm trying to port Mercurial to Python 3. It started a port using 2to3
and Python 3.3 to see if it is possible or not.
Because Mercurial still supports Python 2.4, the b'...' (bytes string
literal) syntax can be used. I don't want to maintain a Mercurial fork,
so I propose to mark byte strings using a function like six.b() to have
something like:
text = b('bytes') # instead of b'bytes'
The six may be used as an external library (with a local copy if
needed), or the library can be easily be copied, or a new compatiblity
library can be written.
http://packages.python.org/six/
Because I don't know Mercurial, it's difficult to understand if bytes or
Unicode type should be used. If I understood correctly, the following
types should be used:
* (changeset) description: Unicode
* changeset data (revlog.revision): bytes
* filename: Unicode
* revision (node identifier): bytes
* username: Unicode
"Unicode" means that the "unicode" Python 2 type should be used instead
of "str" (bytes). It would simplify the code by avoiding transcoding
(tolocal, fromlocal).
Thanks to the PEP 383, the Python 3 has a special error handler to avoid
decoding error: "surrogateescape" error handler stores invalid bytes as
surrogates. Python 2 doesn't have this error handler, so I'm not sure
that we can use Unicode for filename with Python 2. But Unicode can be
used for the description and username.
I don't understand if mdiff and bdiff should process bytes or Unicode. I
would prefer bytes because I don't know how to guess the encoding of an
arbitrary file...
For the serialization, Unicode (username, description, filenames, etc.)
should be encoded to UTF-8. It looks like filenames are stored in the
locale encoding. For backward compatibility, a new feature should be
added to .hg/requires (e.g. "utf8") and an (atomic) process to convert
the whole repository should be written.
Some files should be read a text:
* "requires"
* "cache/branchheads"
--
See also: http://mercurial.selenic.com/wiki/Py3kPort
--
Port Mercurial to Python 3 is an huge project. It should be done step by
step.
Victor
More information about the Mercurial-devel
mailing list