fastimport: handling UTF-8

Matt Mackall mpm at selenic.com
Fri May 8 10:11:28 CDT 2009


On Fri, May 08, 2009 at 10:10:35AM -0400, Greg Ward wrote:
> Sorry to be thick, but I'm still confused about the right way to get
> UTF-8 strings from an external source (e.g. fastimport) into
> Mercurial, regardless of locale.  That is, I have written a test for
> hg-fastimport that includes various non-ASCII strings in the source
> (author, committer, filename, message), and I want to ensure that they
> are correctly converted without being mangled.  When I run my test
> script with my normal locale (LANG=en_CA.utf8), it works fine.  But
> when run with LANG=C, it fails in various interesting ways.
> 
> So, as Matt suggested, I dug into how other converters work.  I can
> see that it all boils down to convert_source.recode(): e.g. for a
> Subversion source, the commit messages come from svn as UTF-8.  They
> are decoded to Unicode and re-encoded back to UTF-8, and then
> hg_sink.putcommit() ultimately does the work.

You missed the specific piece I pointed you to: how convert changes
hg's current notion of what the local encoding is to UTF-8. By either
changing encoding.encoding (tip) or util._encoding (released)

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial-devel mailing list