[PATCH 3 of 8] Add filesystem path to dirstate.statwalk return value

Matt Mackall mpm at selenic.com
Fri May 2 17:54:54 CDT 2008


On Fri, 2008-05-02 at 20:43 +0100, Paul Moore wrote:
> 2008/5/1 Matt Mackall <mpm at selenic.com>:
> >  When do we ever need both names? In the above case, it's perfectly ok to
> >  convert 'a' to 'A' and forget 'a' was ever mentioned.
> 
> I can't actually remember, but I did originally convert always, and
> hit problems. That's when I decided I needed both.

What probably happened is you hit a case where you passed in 'a' when
you had 'A' on disk and 'a' in the dirstate. The expected filename was
supposed to be 'a', which just happened to be what was on the command
line.

In other words, (cmd:a, disk:A, dirstate:a) -> a, but also (cmd:A,
disk:A, dirstate:a) -> a, because dirstate should have precedence.

> On the other hand, it may simply have been that the test suite broke,
> and I couldn't work out why, so I took the easy way out by returning
> the converted name as an extra arg which I only used when I knew for a
> fact that I needed it.
> 
> >  A more complex case is:
> >
> >  touch a
> >  hg add a   # actually add 'a' to dirstate
> >  hg ci      # store 'a' in manifest
> >  rm a
> >  echo 1 > A
> >  hg ci a    # Now what?
> 
> Good question, to which I don't have a good answer :-)
> 
> Current behaviour is to act as if the file was called a in the
> filesystem, and the main problem with this is that you cannot get the
> filesystem in its current state from any revision (hg up -r0 and hg up
> -r1 both produce a file named a). This isn't particularly sane,
> although it could be argued that it's consistent with case
> insensitivity (although not with case *preserving*).

Ahh, but it -is- case preserving: it's preserving the case that was
originally there at hg add time. Again, Mercurial should only store case
changes you explicitly tell it about as various tools (or users) will
change case behind your back and checking in such changes will make your
UNIX friends very annoyed with you. 

> What happens on Unix (assuming you do hg ci A at the end)?

It will tell you it's never heard of 'A'. Because 'a' and 'A' are
different files.

> My briefly exploded here, as I had been missing that there are *three*
> things involved here, filesystem, dirstate and manifest. In reading
> the code, I'd clearly missed the purpose of dirstate._map.
> 
> > In other words, we need to be able to convert filenames on
> >  disk into filenames in the dirstate. Again, we don't care about the form
> >  the user passed, and we also don't care about the form on disk!
> >
> >  This suggests a hierarchy:
> >
> >  - the form in the dirstate, if available
> >  - the form on disk, if available
> >  - the form on the command line
> >
> >  And we only care about the highest available form.
> >
> >  Which suggests something like dirstate.normalize():
> >
> >     if not dirstate._folding or p in dirstate._map:
> >         return p
> >     elif p in dirstate._folded:
> >         return dirstate._folded[p]
> >     elif os.path.exists(p):
> >         return util.fscase(p)
> >     else
> >         return p
> >
> >  And statwalk/walk/status/etc. should only ever give back that normalized
> >  form. And we still needn't do the expensive fscase step when we're
> >  walking the filesystem (ie most of the time), so we can be a lot smarter
> >  than the above.
> 
> That sounds logical, and a lot less expensive than what I currently have.
> 
> >  But! Something different needs to happen when we do:
> >
> >   hg cat -r 345 a
> >
> >  Here, what's on disk is not at all relevant. If I've got 'a' and 'A' in
> >  rev345, and I ask for 'a', hg had better show me 'a' and not 'A'. So
> >  repo.walk needs to normalize differently.
> 
> I'd say that repo.walk should not normalise at all, as the repository
> *is* case sensitive (insofar as it maintains a distinction between 'a'
> and 'A' via mangling). So hg cat -r 345 a is (by virtue of the fact
> that it's a repository operation) always case sensitive.

Fair enough. So we can hopefully localize all the case smarts down in
the dirstate.

> This is actually quite an easy example, as such a repo *must* have
> come from a case sensitive system, and so users of that repo have to
> get to grips with case issues. After all, hg up -r 345 will fail on a
> case insensitive system (it has to, as it's not possible to represent
> a a directory with files 'a' and 'A' present).
> 
> The more subtle case is what if only 'A' is present in r345? Should hg
> cat -r 345 a return it? I suspect that the poster of issue 646 would
> think so.

Well I'm perfectly happy to tell them they're wrong.

> >  > I wish I'd known you were working on this refactoring when I started
> >  > looking at statwalk :-( Is the code available anywhere public?
> >
> >  Not yet, I keep breaking it. I'll try to push out the working bits
> >  shortly.
> 
> I'm glad to know it's not just me - if even you keep breaking things,
> then I feel much better :-)

The whole status/walk code is indeed incredibly complex. My goal is to
simplify it enough so that mere mortals are at least comfortable calling
it if not hacking on it.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list