Solliciting Feedback: Tracking and Storing File Encoding

Yuya Nishihara yuya at tcha.org
Sat Sep 19 07:00:19 CDT 2015


On Fri, 18 Sep 2015 14:11:02 -0700, Erik van Zijst wrote:
> Following earlier only discussions and yesterday's minisprint, I went
> ahead and typed out a spec for how we could track character encoding
> meta data alongside repository contents.
> 
> I've put this on the wiki and I'd like to ensure I haven't missed or
> overlooked anything before I set out to implement this:
> 
> https://mercurial.selenic.com/wiki/Tracking%20File%20Encoding

This is somewhat important for us because we have to go with different
encodings in Japan. So I wrote an extension [1] that allows me to see
diffs of Windows files in the encoding of my Unix terminal.

 [1]: https://mercurial.selenic.com/wiki/TextfulExtension

A couple of comments:

- $HGENCODING has no effect on the encoding of file contents

- It will be nice if Mercurial can optionally apply the encoding conversion
  before showing annotation, diffs, applying patches, etc.

  I know encoding conversion is lossy, but sometimes it is unavoidable because
  Microsoft tools tend to generate UTF-16LE files to mess up things. They also
  add BOM to UTF-8.

Regards,


More information about the Mercurial-devel mailing list