[PATCH 3 of 8] Add filesystem path to dirstate.statwalk return value
Paul Moore
p.f.moore at gmail.com
Fri May 2 14:43:15 CDT 2008
2008/5/1 Matt Mackall <mpm at selenic.com>:
> When do we ever need both names? In the above case, it's perfectly ok to
> convert 'a' to 'A' and forget 'a' was ever mentioned.
I can't actually remember, but I did originally convert always, and
hit problems. That's when I decided I needed both.
On the other hand, it may simply have been that the test suite broke,
and I couldn't work out why, so I took the easy way out by returning
the converted name as an extra arg which I only used when I knew for a
fact that I needed it.
> A more complex case is:
>
> touch a
> hg add a # actually add 'a' to dirstate
> hg ci # store 'a' in manifest
> rm a
> echo 1 > A
> hg ci a # Now what?
Good question, to which I don't have a good answer :-)
Current behaviour is to act as if the file was called a in the
filesystem, and the main problem with this is that you cannot get the
filesystem in its current state from any revision (hg up -r0 and hg up
-r1 both produce a file named a). This isn't particularly sane,
although it could be argued that it's consistent with case
insensitivity (although not with case *preserving*).
My instinct says reject it as an error. Not all operations (ab)using
case insensitivity have to be acceptable.
What happens on Unix (assuming you do hg ci A at the end)? It doesn't
seem to me that this sequence produces a repository which can ever
reproduce the current directory on Unix either, so maybe that's
acceptable.
The problem here is that personally, I use Windows as if it was
case-sensitive mostly - i.e., I only ever use a single case in
practice. The most I ever do is occasionally type all lowercase
arguments to a command where the filename is mixed case (and that's
only if I don't just use command line completion). So I don't actually
have any good intuition of what to do here.
> We've got 'a' in the dirstate and the manifest. So we ought to check in
> 'a', no? We shouldn't record case changes unless someone explicitly
> tells us to.
While I'm not sure that this is self-evident, I'd be happy to agree to
it as a principle to be applied.
My briefly exploded here, as I had been missing that there are *three*
things involved here, filesystem, dirstate and manifest. In reading
the code, I'd clearly missed the purpose of dirstate._map.
> In other words, we need to be able to convert filenames on
> disk into filenames in the dirstate. Again, we don't care about the form
> the user passed, and we also don't care about the form on disk!
>
> This suggests a hierarchy:
>
> - the form in the dirstate, if available
> - the form on disk, if available
> - the form on the command line
>
> And we only care about the highest available form.
>
> Which suggests something like dirstate.normalize():
>
> if not dirstate._folding or p in dirstate._map:
> return p
> elif p in dirstate._folded:
> return dirstate._folded[p]
> elif os.path.exists(p):
> return util.fscase(p)
> else
> return p
>
> And statwalk/walk/status/etc. should only ever give back that normalized
> form. And we still needn't do the expensive fscase step when we're
> walking the filesystem (ie most of the time), so we can be a lot smarter
> than the above.
That sounds logical, and a lot less expensive than what I currently have.
> But! Something different needs to happen when we do:
>
> hg cat -r 345 a
>
> Here, what's on disk is not at all relevant. If I've got 'a' and 'A' in
> rev345, and I ask for 'a', hg had better show me 'a' and not 'A'. So
> repo.walk needs to normalize differently.
I'd say that repo.walk should not normalise at all, as the repository
*is* case sensitive (insofar as it maintains a distinction between 'a'
and 'A' via mangling). So hg cat -r 345 a is (by virtue of the fact
that it's a repository operation) always case sensitive.
This is actually quite an easy example, as such a repo *must* have
come from a case sensitive system, and so users of that repo have to
get to grips with case issues. After all, hg up -r 345 will fail on a
case insensitive system (it has to, as it's not possible to represent
a a directory with files 'a' and 'A' present).
The more subtle case is what if only 'A' is present in r345? Should hg
cat -r 345 a return it? I suspect that the poster of issue 646 would
think so.
> > I wish I'd known you were working on this refactoring when I started
> > looking at statwalk :-( Is the code available anywhere public?
>
> Not yet, I keep breaking it. I'll try to push out the working bits
> shortly.
I'm glad to know it's not just me - if even you keep breaking things,
then I feel much better :-)
Thanks for your patience.
Paul.
More information about the Mercurial-devel
mailing list