Managing multiple encodings in one repository

David Rushby davidrushby at gmail.com
Fri Apr 6 00:03:15 CDT 2007


On 4/5/07, Matt Mackall <mpm at selenic.com> wrote:
> On Thu, Apr 05, 2007 at 08:11:06PM +0400, David Rushby wrote:
> > On 4/5/07, Matt Mackall <mpm at selenic.com> wrote:
> > >>      If I save Mercurial.ini as (for example) UTF-8, then specify
> > >> "--encoding=utf8" or environment variable HGENCODING=utf8, the
> > >> username emerges is garbled.
> > >
> > >What precisely is happening? Is Mercurial properly reading your .ini
> > >as UTF-8 and then displaying it as UTF-8, which your console tries to
> > >interpret as Windows-1251? This will manifest as all the non-ASCII
> > >characters being represented as multiple characters.
> >
> > No, that's not what's happening.  Mercurial is try to pretend that the
> > contents of Mercurial.ini are stored in the system default encoding,
> > even when I specify another encoding.
> >
> > Here's a simple way to reproduce the problem (on Windows, at least):
> > ...
>
> UTF-16 is a whole 'nother story. If this had been UTF-8 without the
> stupid BOM marker (\xff\xfe), this would have worked just fine.

You're right.  Excuse me if my tone about this issue was presumptuous.
 When I saved Mercurial.ini as UTF-8, set HGENCODING=utf8, and worked
with text files whose contents were purely UTF-8, everything was
peachy.

> > >>  2) Be able to see encoding-normalized output from commands that
> > >> might operate on files with different encodings.
>
> See the encode and decode filters:
>
> http://www.selenic.com/mercurial/wiki/index.cgi/EncodeDecodeFilter

Thanks for the advice.  I'll delve into these, because the
_fallbackencoding hack didn't help.  Unfortunately, it really is
necessary for me to work with source files in multiple encodings.


More information about the Mercurial mailing list