[PATCH stable v2] convert: use original local encoding when converting from Perforce

Matt Mackall mpm at selenic.com
Thu Jul 23 10:14:33 CDT 2015


On Wed, 2015-07-22 at 23:18 +0100, Eugene Baranov wrote:
> My 'active code page' is 850, but p4 indeed generates 1252-encoded text.
> 
> I've tried to 'convince' p4 to output in UTF-8, but so far I haven't
> figured out how.

This suggest the magic is to set P4CHARSET=utf8. And there also appears
to be a -C switch to force the encoding:

http://www.perforce.com/perforce/doc.current/user/i18nnotes.txt

>From what I gather, the encoding switch affects:

a) metadata
b) filenames
c) contents of files marked as type 'unicode'

(b) doesn't agree with the Mercurial approach, which treats filenames
themselves as data to be byte-preserved. And if we get UTF8 filenames
out of p4, we're going to have a problem on Windows until this thing is
finished:

https://mercurial.selenic.com/wiki/WindowsUTF8Plan

So we actually might want a "split" approach here: use -C utf8 to
extract metadata and -C <some configured encoding> to extract filenames.

(c) is a bit of a problem: if we have a file named café containing
"café" marked as Unicode, it's not clear that there's a way to ask for
it by name in Latin1/1252 and get its contents back in UTF8.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list