Corrupted repositories on NFS

Adrian Buehlmann adrian at cadifra.com
Mon Nov 29 07:11:44 CST 2010


On 2010-11-29 11:50, Nicolas Dumazet wrote:
> 2010/11/26 Jesper Noehr <jesper at noehr.org>:
>> I'm chiming in here as I'm kind of in the dark whether this is an
>> actual bug in Mercurial, and whether my fix is actually "good."
>>
>> Any comments appreciated.
> 
> It's not the first time I hear about NFS bugs.
> 
> In reality the second most common troubleshooting question on IRC for
> strange bugs, after "do you use inotify", is usually: do you access
> your repo over NFS, or use any strange filesystem? ;)
> 
> 
> 1) First patch looks OK. Returning a dummy value (-1? 'race'?) will
> force testlock into a ValueError, effectively raising LockHeld and
> forcing us to retry creating a lock.
> 
> 2) I'm not an NFS expert, but googling seems to show that the atomic
> way to delete a file should indeed be a rename+unlink, so the second
> (a cleaner version ;) patch makes sense to me. As a note, lock.release
> swallows OSErrors after an unlink, we probably want to include this in
> util.unlock?

FWIW, Last night I've written

http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows

for how unlink behaves on Windows if another process has the file open
(we'll probably have to discuss how to deal with that in a separate thread).

> The trace you're seeing, however, does not look too good. Supposedly,
> "wctx = self[None]; merge = len(wctx.parents()) > 1" sequence happens
> while we hold the wlock: we should be the only ones operating on the
> repo...
> Can you try to log more, to see what's happening?


More information about the Mercurial-devel mailing list