fastimport: handling UTF-8
Matt Mackall
mpm at selenic.com
Fri May 8 10:11:28 CDT 2009
On Fri, May 08, 2009 at 10:10:35AM -0400, Greg Ward wrote:
> Sorry to be thick, but I'm still confused about the right way to get
> UTF-8 strings from an external source (e.g. fastimport) into
> Mercurial, regardless of locale. That is, I have written a test for
> hg-fastimport that includes various non-ASCII strings in the source
> (author, committer, filename, message), and I want to ensure that they
> are correctly converted without being mangled. When I run my test
> script with my normal locale (LANG=en_CA.utf8), it works fine. But
> when run with LANG=C, it fails in various interesting ways.
>
> So, as Matt suggested, I dug into how other converters work. I can
> see that it all boils down to convert_source.recode(): e.g. for a
> Subversion source, the commit messages come from svn as UTF-8. They
> are decoded to Unicode and re-encoded back to UTF-8, and then
> hg_sink.putcommit() ultimately does the work.
You missed the specific piece I pointed you to: how convert changes
hg's current notion of what the local encoding is to UTF-8. By either
changing encoding.encoding (tip) or util._encoding (released)
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list