[PATCH stable v2] convert: use original local encoding when converting from Perforce
Matt Mackall
mpm at selenic.com
Thu Jul 23 10:14:33 CDT 2015
On Wed, 2015-07-22 at 23:18 +0100, Eugene Baranov wrote:
> My 'active code page' is 850, but p4 indeed generates 1252-encoded text.
>
> I've tried to 'convince' p4 to output in UTF-8, but so far I haven't
> figured out how.
This suggest the magic is to set P4CHARSET=utf8. And there also appears
to be a -C switch to force the encoding:
http://www.perforce.com/perforce/doc.current/user/i18nnotes.txt
>From what I gather, the encoding switch affects:
a) metadata
b) filenames
c) contents of files marked as type 'unicode'
(b) doesn't agree with the Mercurial approach, which treats filenames
themselves as data to be byte-preserved. And if we get UTF8 filenames
out of p4, we're going to have a problem on Windows until this thing is
finished:
https://mercurial.selenic.com/wiki/WindowsUTF8Plan
So we actually might want a "split" approach here: use -C utf8 to
extract metadata and -C <some configured encoding> to extract filenames.
(c) is a bit of a problem: if we have a file named café containing
"café" marked as Unicode, it's not clear that there's a way to ask for
it by name in Latin1/1252 and get its contents back in UTF8.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list