Unicode support in log messages and file names

Thomas Arendsen Hein thomas at intevation.de
Sat Nov 11 13:38:09 CST 2006


* Andrey <grooz-work at gorodok.net> [20061111 18:33]:
> > As you have seen in http://www.selenic.com/mercurial/bts/issue156
> > I always got distracted by other topics for too many months now.
> >
> > Feel free to provide something for encoding log messages, but
> > encoding file names is a different topic and needs discussion.
> 
> Yes, encoding messages seems much more easy to implement. I'm planning to do 
> it like this:
> - decode user provided log messages from locale encodeded byte strings to 
> unicode strings;
> - use terminal encoding to display those messages (provided as unicode 
> strings);
> - encode log messages (provided as unicode string) in UTF-8 when storing;
> - decode log messages from UTF-8 to unicode string when retrieving;
> - (the most complex part) make sure switching from byte strings to unicode 
> strings does not break thinks.
> 
> Is this list OK?

I've tried exactly this one year ago when Mercurial was much smaller
and after talking to other people we (including Matt) decided that
the desired way is to immediately convert from local encoding to
UTF-8, like Vicent Seguí Pascual originally proposed.

Unfortunately I wanted to do it exactly like you by that time, the
result is that we have no unified log encoding yet.

You can see his patches in the list archive from July 2005.

Thomas

-- 
Email: thomas at intevation.de
http://intevation.de/~thomas/


More information about the Mercurial mailing list