[PATCH 00 of 13] Cleanup of the purge extension

Emanuele Aina faina.mail at tiscali.it
Wed Mar 7 03:22:59 CST 2007


Alexis S. L. Carvalho evidenziò:

>>> Maybe it'd be enough to refuse to run if statwalk returns some "m"issing
>>> file, but I'm not completely sure.
>>
>> Even this is not going to be enough: in your example, doing 'hg purge'
>> will erroneously delete the unknown "Foo" file, even if it is not
>> present any missing file. :(
> 
> Aborting if statwalk returns a file with src == 'm' (a.k.a. file in
> dirstate, but missing in the filesystem) would catch my example - as far
> as statwalk is concerned, the file "foo" is missing.

Oh, I didn't see that dirstate.status() is stat'ing files after
dirstate.statwalk().


> But again: I'm not sure this would be safe enough.
> 
>> This problem is not strictly related to 'purge' but more general as it
>> affects also 'status', as it is shown by your example.
> 
> status calls something like
> 
> repo.status() -> dirstate.status() -> dirstate.statwalk()
> 
> If statwalk claims a file is missing, dirstate.status explicitly
> os.lstat's it and, in my example above, finds it.
> 
> So, hg status manages to find "foo", even though it also shows "Foo" as
> an unknown file (which is mostly harmless, until somebody tries to e.g.
> clean the tree ;) .

:)

> IOW, there are 2 different problems: hg is usually interested only in
> tracked files - it can just lstat every file in this list to see if it's
> on the filesystem.  OTOH, purge has a list of the files on the
> filesystem and it wants to know which ones are not tracked by hg -
> which, as we're seeing, can be quite a chore when there are aliases
> around.
> 
>> The problem could be divided in two:
>>
>> - detect case-insensitive or name mangling filesystems
>>   we could maybe put a special file in .hg and, at repo object creation,
>>   try to access it with a different name: for example '.hg/Foo-è',
>>   accessed with '.hg/foo-è' and '.hg/Foo-e`' (in unicode)
> 
> This is a bit like util.checkfolding (which only checks case
> collisions).  Right now it's used only by hg update/merge/revert.

Thank you for pointing out util.checkfolding().


>> - normalize the file names
>>   it can be done in the dirstate.__contains__() method, once a
>>   name-mangling fs has been detected
> 
> This is harder - right now hg doesn't require e.g. UTF-8 paths, so
> normalizing things could get interesting...
> 
> Also, this could be somewhat too expensive in repos with many files -
> especially if the main user is hg purge.

'hg purge' should not be called too often, so I think we can aim for
correctness than performance here.

In fact purge is specially useful when you want to rebuild an
autotoolized project from scratch, as 'make distclean' will usually
leave cruft around.


>>> I'd really like to put at least some safety net before moving purge.py
>>> to hgext.
>> What kind of safety net?
> 
> For example, refusing to run if there are missing files (I think it
> should be enough, if we assume that the filesystem doesn't return 2
> aliases to the same file[1] on a single os.listdir - which I hope is not
> much to ask...  But I wouldn't mind somebody thinking a bit more about
> this).
> 
> Maybe this could be done only for name-mangling filesystems.  And we
> probably could use a --force flag to allow users to shoot their feet.

Do you think that this code will be enough?

if missing and not (util.checkfolding(repo.wjoin('.hg/Foo') or force):
     util.Abort("purging on name mangling fs is not yet fully supported")


> BTW, it'd probably be nice for hg purge to get some options to remove
> only unknown files, ignored files or empty directories.

Added to my TODO. :)

-- 
Buongiorno.
Complimenti per l'ottima scelta.




More information about the Mercurial-devel mailing list