Managing multiple encodings in one repository

Matt Mackall mpm at selenic.com
Thu Apr 5 11:55:54 CDT 2007


On Thu, Apr 05, 2007 at 08:30:53PM +0400, David Rushby wrote:
> A few minutes ago, I wrote:
> >Mercurial is going right ahead and trying to read Mercurial.ini as if
> >it were encoded in the system default encoding.  If I replace the line
> >fp = open(f)
> >in ui.py:ui:readconfig with
> >import codecs; fp = codecs.open(f, encoding='utf16')
> >then Mercurial is able to read a UTF-16-encoded Mercurial.ini.
> >Obviously, a real fix would need to use the active Mercurial encoding
> >instead of hard-coded 'utf16'.
> 
> That description of the problem is correct, but the suggested solution
> would not be adequately flexible.  The "general" Mercurial encoding
> and the encoding by which the config file is read really need to be
> separately specifiable.  It doesn't make sense to have to convert
> one's config file to a different encoding every time one wants to use
> a different --encoding on the command line.

Well the theory was that you only use --encoding or HGENCODING when
your system's setup is too busted to work properly. And then you use
it consistently, so it matches your config file.

But if you really need to switch between encodings regularly and can't
just use all UTF-*8*, I've got a hack for you to try:

In mercurial/util.py, find fallback encoding:

_fallbackencoding = 'ISO-8859-1'

Change that to 'utf8' and make your config file use UTF-8. Then
whenever Mercurial fails to decode something that doesn't match
--encoding, it'll try UTF-8.

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial mailing list