Unicode Windows API, Was: Concerns about using Python's ctypes library on Windows

Antoine Pitrou solipsis at pitrou.net
Sun Jul 31 08:30:00 CDT 2011


On Sun, 31 Jul 2011 11:19:15 +0200
Adrian Buehlmann <adrian at cadifra.com> wrote:
> 
> What I'm asking myself a bit is how efficient regarding speed it is to
> convert forth and back from UTF-16 at such a low layer as the fixutf8
> extension naturally has to do it.

First, you don't have to encode to UTF-16 explicitly. You can use
unicode objects, since they are represented using 2-byte code points
under Windows: the internal representation can be passed directly to
the Windows "wide" APIs (that's what Python itself does, if e.g. you
pass an unicode string to open()).

Second, with Python 2.7:

$ python -m timeit -s "s='abcé'*16" "s.decode('utf8').encode('utf8')"
100000 loops, best of 3: 2.41 usec per loop

So a 64-character file name or path takes 2 microseconds to round trip
from utf-8 bytes to unicode and vice-versa. I don't think you need to
worry about that overhead.

Regards

Antoine.




More information about the Mercurial-devel mailing list