[PATCH 3 of 8] Add filesystem path to dirstate.statwalk return value

Paul Moore p.f.moore at gmail.com
Fri May 2 14:43:15 CDT 2008


2008/5/1 Matt Mackall <mpm at selenic.com>:
>  When do we ever need both names? In the above case, it's perfectly ok to
>  convert 'a' to 'A' and forget 'a' was ever mentioned.

I can't actually remember, but I did originally convert always, and
hit problems. That's when I decided I needed both.

On the other hand, it may simply have been that the test suite broke,
and I couldn't work out why, so I took the easy way out by returning
the converted name as an extra arg which I only used when I knew for a
fact that I needed it.

>  A more complex case is:
>
>  touch a
>  hg add a   # actually add 'a' to dirstate
>  hg ci      # store 'a' in manifest
>  rm a
>  echo 1 > A
>  hg ci a    # Now what?

Good question, to which I don't have a good answer :-)

Current behaviour is to act as if the file was called a in the
filesystem, and the main problem with this is that you cannot get the
filesystem in its current state from any revision (hg up -r0 and hg up
-r1 both produce a file named a). This isn't particularly sane,
although it could be argued that it's consistent with case
insensitivity (although not with case *preserving*).

My instinct says reject it as an error. Not all operations (ab)using
case insensitivity have to be acceptable.

What happens on Unix (assuming you do hg ci A at the end)? It doesn't
seem to me that this sequence produces a repository which can ever
reproduce the current directory on Unix either, so maybe that's
acceptable.

The problem here is that personally, I use Windows as if it was
case-sensitive mostly - i.e., I only ever use a single case in
practice. The most I ever do is occasionally type all lowercase
arguments to a command where the filename is mixed case (and that's
only if I don't just use command line completion). So I don't actually
have any good intuition of what to do here.

>  We've got 'a' in the dirstate and the manifest. So we ought to check in
>  'a', no? We shouldn't record case changes unless someone explicitly
>  tells us to.

While I'm not sure that this is self-evident, I'd be happy to agree to
it as a principle to be applied.

My briefly exploded here, as I had been missing that there are *three*
things involved here, filesystem, dirstate and manifest. In reading
the code, I'd clearly missed the purpose of dirstate._map.

> In other words, we need to be able to convert filenames on
>  disk into filenames in the dirstate. Again, we don't care about the form
>  the user passed, and we also don't care about the form on disk!
>
>  This suggests a hierarchy:
>
>  - the form in the dirstate, if available
>  - the form on disk, if available
>  - the form on the command line
>
>  And we only care about the highest available form.
>
>  Which suggests something like dirstate.normalize():
>
>     if not dirstate._folding or p in dirstate._map:
>         return p
>     elif p in dirstate._folded:
>         return dirstate._folded[p]
>     elif os.path.exists(p):
>         return util.fscase(p)
>     else
>         return p
>
>  And statwalk/walk/status/etc. should only ever give back that normalized
>  form. And we still needn't do the expensive fscase step when we're
>  walking the filesystem (ie most of the time), so we can be a lot smarter
>  than the above.

That sounds logical, and a lot less expensive than what I currently have.

>  But! Something different needs to happen when we do:
>
>   hg cat -r 345 a
>
>  Here, what's on disk is not at all relevant. If I've got 'a' and 'A' in
>  rev345, and I ask for 'a', hg had better show me 'a' and not 'A'. So
>  repo.walk needs to normalize differently.

I'd say that repo.walk should not normalise at all, as the repository
*is* case sensitive (insofar as it maintains a distinction between 'a'
and 'A' via mangling). So hg cat -r 345 a is (by virtue of the fact
that it's a repository operation) always case sensitive.

This is actually quite an easy example, as such a repo *must* have
come from a case sensitive system, and so users of that repo have to
get to grips with case issues. After all, hg up -r 345 will fail on a
case insensitive system (it has to, as it's not possible to represent
a a directory with files 'a' and 'A' present).

The more subtle case is what if only 'A' is present in r345? Should hg
cat -r 345 a return it? I suspect that the poster of issue 646 would
think so.

>  > I wish I'd known you were working on this refactoring when I started
>  > looking at statwalk :-( Is the code available anywhere public?
>
>  Not yet, I keep breaking it. I'll try to push out the working bits
>  shortly.

I'm glad to know it's not just me - if even you keep breaking things,
then I feel much better :-)

Thanks for your patience.
Paul.


More information about the Mercurial-devel mailing list