Bug 3506 - race condition in trylock/testlock
Summary: race condition in trylock/testlock
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All Linux
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-20 08:22 UTC by Andrew Suffield
Modified: 2012-10-19 14:25 UTC (History)
4 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Suffield 2012-06-20 08:22 UTC
mercurial.lock.trylock can follow this sequence of events:

call util.makelock
get EEXIST because somebody else has it locked
the other process that has the lock releases it at this point, deleting the symlink
call self.testlock
-> call util.readlock
   -> call os.readlink
   -> get ENOENT
   -> raise, because it's not EINVAL or ENOSYS
we're not inside any exception handler here, so OSError is propagated up the call stack, eventually aborting the whole operation

The correct behavior is of course to swallow ENOENT in trylock and go back around the loop - the lock was just released so this process should try to take it.
Comment 1 Bryan O'Sullivan 2012-09-10 20:29 UTC
Confirmed locally.
Comment 2 HG Bot 2012-09-29 16:11 UTC
Fixed by http://selenic.com/repo/hg/rev/829919ef894a
Tomasz Kleczek <tomasz.kleczek@fb.com>
lock: fixed race condition in trylock/testlock (issue3506)

Suppose the following scenario:

1. Process A takes the lock (e.g. on commit).
2. Process B wants to grab the lock. Since lock file exists
   the exception is raised. In the catch block the testlock
   function is called.
3. Process A releases the lock.
4. Process B tries to read the lock file as a part of testlock
   function. This results in OSError (ENOENT) and since we're
   not inside the exception handler function this is propagated
   and aborts the whole operation.

To fix this we now check in testlock function whether lock file
actually exists and if not (i.e. if readlock fails) we just return.

(please test the fix)