Managing multiple encodings in one repository

David Rushby davidrushby at gmail.com
Mon Apr 9 00:38:26 CDT 2007


On 4/6/07, David Rushby <davidrushby at gmail.com> wrote:
> > > >>  2) Be able to see encoding-normalized output from commands that
> > > >> might operate on files with different encodings.
> >
> > See the encode and decode filters:
> >
> > http://www.selenic.com/mercurial/wiki/index.cgi/EncodeDecodeFilter
>
> Thanks for the advice.  I'll delve into these, because the
> _fallbackencoding hack didn't help.  Unfortunately, it really is
> necessary for me to work with source files in multiple encodings.

Mercurial now works perfectly for me.  The goal of normalizing all
textual input to UTF-8, so that Mercurial can show reasonable output
for all textual encodings, has been achieved (in my specific scenario,
at least).

Here's how things are currently arranged:

- The HGENCODING environment variable is set to "utf8".

- Mercurial.ini is saved in UTF-8.

- I wrote an encoding filter that converts Python files from their
external encoding (specified in the "#-*- coding: encodingname -*-"
header) to UTF-8.  Even though I work with source files written in
many languages other than Python, the same sort of scheme can be
applied to them.
  - I wrote a decoding filter that does the inverse conversion.

- I wrote a custom hgmerge replacement that converts from the external
encoding to UTF-8 before passing the files to kdiff3, then back to the
external encoding after kdiff3 is finished.  kdiff3 can therefore be
set to use UTF-8 all the time, yet it can operate on text that is
stored externally in any encoding.

- "hg serve", hgk, and the PyGTK-based implementation of hgk all work
perfectly, because everything they consume is in UTF-8.

- cmd.exe does not work well at all with UTF-8, even with "chcp
65001".  Since I only need to operate on English and Russian text, I
wrote a wrapper for the 'hg' command that overrides sys.stdout with a
Windows1251-encoding output stream created via
codecs.lookup('cp1251')[3].  I can therefore use cmd.exe in "chcp
1251" mode, which actually works.

Thanks very much for your advice, Matt.


More information about the Mercurial mailing list