[PATCH 0 of 9 RFC] manage filename normalization policy per repository
Matt Mackall
mpm at selenic.com
Mon Jun 4 09:19:09 CDT 2012
On Mon, 2012-06-04 at 19:07 +0900, FUJIWARA Katsunori wrote:
> At Sun, 03 Jun 2012 17:52:30 -0500,
> Matt Mackall wrote:
> >
> > On Sat, 2012-06-02 at 23:36 +0900, FUJIWARA Katsunori wrote:
>
> > > BTW, in transition period, repositories using different encodings for
> > > filenames may exist in same host: cp932 and utf-8, for example.
> >
> > Huh? Please go read that page again, because I don't think you
> > understood it:
> >
> > http://mercurial.selenic.com/wiki/WindowsUTF8Plan#Definitions
> > http://mercurial.selenic.com/wiki/WindowsUTF8Plan#Upgrading_to_UTF-8
> >
> > I fully expect SINGLE repos to have different encodings in different
> > changesets. This is in fact what will allows us to upgrade them. There
> > will be no notion of "repository encoding".
>
> Sorry, I used term "repository encoding" as:
>
> if there are only legacy changesets in the repository, and users
> assume that filenames are encoded only one encoding in their mind,
> such encoding can be recognized as "repository encoding"
>
> OK, I'll use just "UTF-8 changeset" and "legacy changeset".
>
>
> I assume that two UTF-8 changesets below:
>
> A. a changeset where every filename in it uses only ASCII chars
>
> B. a changeset where some filename in it uses non ASCII, but UTF-8
> valid characters
>
> To children of (B), I don't want to add any file of which name uses
> chars in encoding other than UTF-8, but may want to do so to children
> of (A): it is normal usecase of adding new files using non-ASCII chars
> in their names with current Mercurial.
>
> If the parent of working directory is (B), tools can assume that the
> filename encoding in user mind is UTF-8: tools like TortoiseHg, which
> aware of dirstate structure and invoke Mercurial API directly in own
> process, can detect it and pass filename strings in UTF-8 encoding.
I actually intend for A and B to operate the same: all new files are
UTF-8/ASCII. Thus, you transparently upgrade to the new mode when adding
non-ASCII files without having to think about it.
> In the other hand, if the parent is (A), tools can't know what
> encoding user want to use for filenames: user may have to use encoding
> other than UTF-8 because of repository management rule in the project,
> for example.
> Here, I want to confirm that:
>
> in the latter case (= children of (A)), HGENCODING env should be
> referred to decide filename encoding.
No. HGENCODING has never and will never have any relation to filenames.
Filename are either recognizable as UTF-8/ASCII (for the purposes of
making Windows happy) or bytes in an unspecified legacy encoding that we
don't know or care about just like file contents (everywhere else).
> Next. According to "WindowsUTF8Plan" wiki page:
>
> "Merge between UTF-8 and non-UTF-8 commits" could create
> problems. We probably don't want to make merge aware of this
> issue.
>
> This is true for "UTF-8 changeset (B)" above and legacy one, but not
> for (A) and legacy one, isn't it ?
Yes and no. The result of either may be a legacy changeset or a UTF-8
changeset, depending on rename history.
My advice: don't think about it yet.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list