[PATCH 0 of 9 RFC] manage filename normalization policy per repository

Matt Mackall mpm at selenic.com
Thu Jun 7 14:16:56 CDT 2012


On Thu, 2012-06-07 at 15:28 +0900, FUJIWARA Katsunori wrote:
> > >       - so, "old tools", which always use system code page as filename
> > >         encoding, are not recommended for repositories having UTF-8
> > >         changesets already
> > 
> > We should probably try to avoid breaking them.
> > 
> > The best way to do that is probably to extend our case-folding logic to
> > cover this when in UTF-8 mode. So, we would treat a Latin1 command-line
> > argument 'á' as the same as UTF-8 'á' when the ANSI codepage is set to
> > Latin1, just like we treat 'A' the same as 'a'.
> > 
> > (There may be instances where this is ambiguous, I think those cases
> > will be rare-to-nonexistent in practice. For instance, in Latin1, all
> > UTF-8 continuation bytes (0x80-0xbf) are invalid or symbols, so you're
> > unlikely to get a UTF-8 filename that's meaningful in Latin1 or
> > vice-versa. Similarly with Shift-JIS.)
> 
> I agree with guessing UTF-8 byte sequence corresponded to original one
> by system code page, at invocation from command line.
> 
> But at invocation from GUIs like TortoiseHg, which invokes Mercurial
> internal API directly, encoding filenames into UTF-8 is responsibility
> of them.

> Fortunately, TortoiseHg is always released with the latest Mercurial,
> so we can ignore this case for it. But I'm not sure for the other
> tools.

Anything using the internal API is responsible for any breakage that it
encounters when the internal API changes. It has always been our policy
that it is "not our problem".

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list