[PATCH 0 of 9 RFC] manage filename normalization policy per repository

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Sun May 27 09:36:46 CDT 2012


At Fri, 25 May 2012 16:06:28 -0500,
Matt Mackall wrote:
> 
> On Sat, 2012-05-26 at 00:00 +0900, FUJIWARA Katsunori wrote:
> > this patch series allows users to manage filename normalization policy
> > per repository
> > 
> > this is just for the base of discussion, and tested a little: clone,
> > bundle/unbundle, archive, diff, export/import.... simply.
> 
> What happens if:
> 
> a) a Mac user adds a file NFD(X)
> b) that same user mentions that file in another file Y as NFD(X)
> c) a Linux or Windows[1] -tool- tries to locate the file listed in Y but
> Mercurial has helpfully transformed it to NFC(X) on check-out
> 
> Answer: neither Linux nor Windows will treat NFC(X) and NFD(X) as the
> same file. And we won't renormalize the _contents_ of file Y, so
> renormalizing the filename _introduces_ a mismatch. So.. it breaks. And
> breaks here means "mysteriously stops compiling", "mysteriously gives
> 404s", "mysteriously crashes our mission-critical infrastructure".
> 
> Compare that with "user gets extremely annoyed by filenames he can read
> and click on but can't type".[2]
> 
> This is another manifestation of the makefile problem: filenames
> referred to inside other files MUST agree with what TOOLS see on the
> filesystem for the tools to work.
> 
> Fundamentally, we can't force a Mac user to make Y reference NFC(X)
> rather than NFD(X). Nor can we even detect it! So we can't prevent them
> from making a non-portable commit. I'm afraid the best we can do is warn
> Mac users that they're adding NFD files.
> 
> However, in the current scheme, a non-Mac user can always rename NFD(X)
> to NFC(X) and fixup Y without introducing a commit that doesn't build.
> 
> Yes, NFD is a massively stupid annoyance to users. But your
> renormalizing technique will break more than it fixes for any project
> that contains non-ASCII inter-file references. And because we're an SCM
> (and not a CMS), that's what we care about.
> 
> [1] assuming we get a UTF-8 mode working on Windows
> [2] which is actually a generic Unicode problem, because in addition to
> NFD, Unicode has tons of homoglyphs. duplicate characters, and the vast
> majority of characters aren't even typable on any given keyboard.

I assumed that:

  - all users are aware of filename normalization policy& for their
    shared repository

  - some kind of build tool is used for appropriateness of file
    contents and early detection of invalid contents

but, as you described, Mercurial can not (and should not) restrict
contents of files tracked in it.

Ok, I withdraw my patch proposal.

Then, what about the another extension plan below ?

  - configures filename normalization policy per root by ".hgeol" like
    file

  - warn(or abort) for adding files of which names are normalized in
    unexpected style

    in addition to it, warn(or abort) for adding files colliding with
    each other in any normalizations, like case-folding handling

  - provides hooks to check normalization style of filenames in
    incoming changes for pretxt*

  - any other things to be done

# possibly, do I overlook already publishded such extensions ?

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list