[PATCH 4 of 4 STABLE] lock: show about possibility of lock corruption for empty locker

Sat Apr 29 04:38:10 EDT 2017

# HG changeset patch
# User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
# Date 1493454424 -32400
#      Sat Apr 29 17:27:04 2017 +0900
# Branch stable
# Node ID 650d9f6bf1e7c1f6dee9d82616af2a56aaf56a2e
# Parent  b64713f57a22301b6f481cd8e401542f46356c94
lock: show about possibility of lock corruption for empty locker

Before this patch, if corrupted lock file makes vfs.readlink() return
empty string, lock acquisition waits for a while, even though lock is
never acquired in such case (this issue was reported by a user using
Mercurial on Windows).

This issue occurs in steps below:

  1. vfs.readlink() returns empty string as "locker" for corrupted
     lock file

  2. lock._testlock() returns empty string, because split(":", 1) on
     "locker" fails by ValueError

  3. lock._trylock() raises LockHeld(EAGAIN), because empty string
     "locker" isn't None

  4. lock.lock() sleeps for next trial

  5. steps above are repeated while timeout, because corrupted lock
     file isn't removed by lock._testlock() at (2)

  6. lock acquisition (and "hg" command, maybe) is aborted by timeout

Then, lock acquisition fails until removal of corrupted lock file by
hand ("hg debuglocks" or so).

On the other hand, it isn't safe to assume that lock was corrupted at
unlocking. There is no mechanical way to determine whether "locker"
process of corrupted lock is still alive or not exactly.

In addition to it, empty "locker" might mean "locking in progress" on
no-symlink platform, because vfs.makelock() isn't atomic action on
such platform (consisting of "open", "write', and "close").

Therefore, it should be responsibility of end users to determine
whether such corrupted lock can be safely removed or not.

This patch shows about possibility of lock corruption for empty
"locker", and suggests use of "hg debuglocks" in "hint" message to
resolve an issue, even though debug commands should be hidden from
ordinary users.

BTW, there is no client code path, which causes "inst.errno !=
errno.ETIMEDOUT" route in callcatch() of scmutil.py, in Mercurial
source tree, AFAIK. lock.py implementation always re-raise LockHeld
with errno.ETIMEDOUT for internal LockHeld. This is reason why this
patch doesn't have test for this route.

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1347,9 +1347,13 @@ class localrepository(object):
                 self.ui.warn(
                     _("waiting for lock on %s held by process %r "
                       "on host %r\n") % (desc, pid, host))
-            else:
+            elif inst.locker:
                 self.ui.warn(_("waiting for lock on %s held by %r\n") %
                              (desc, inst.locker))
+            else:
+                self.ui.warn(_("waiting for lock on %s,"
+                               " but might be corrupted\n") %
+                             (desc))
             # default to 600 seconds timeout
             l = lockmod.lock(vfs, lockname,
                              int(self.ui.config("ui", "timeout", "600")),
diff --git a/mercurial/lock.py b/mercurial/lock.py
--- a/mercurial/lock.py
+++ b/mercurial/lock.py
@@ -161,7 +161,8 @@ class lock(object):
         for new-style locks.
         """
         try:
-            return self.vfs.readlock(self.f)
+            # normalize white space by strip() to detect corruption easily
+            return self.vfs.readlock(self.f).strip()
         except (OSError, IOError) as why:
             if why.errno == errno.ENOENT:
                 return None
diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -151,10 +151,20 @@ def callcatch(ui, func):
     # Mercurial-specific first, followed by built-in and library exceptions
     except error.LockHeld as inst:
         if inst.errno == errno.ETIMEDOUT:
-            reason = _('timed out waiting for lock held by %s') % inst.locker
+            if inst.locker:
+                reason = _('timed out waiting for lock held by %s'
+                           ) % inst.locker
+            else:
+                reason = _('timed out waiting for lock, but might be corrupted')
         else:
-            reason = _('lock held by %s') % inst.locker
+            if inst.locker:
+                reason = _('lock held by %s') % inst.locker
+            else:
+                reason = _('lock held, but might be corrupted')
         ui.warn(_("abort: %s: %s\n") % (inst.desc or inst.filename, reason))
+        if not inst.locker:
+            ui.warn('(%s)\n' %
+                    _("use 'hg debuglocks' to check/unlock corrupted lock"))
     except error.LockUnavailable as inst:
         ui.warn(_("abort: could not lock %s: %s\n") %
                (inst.desc or inst.filename, inst.strerror))
diff --git a/tests/test-no-symlinks.t b/tests/test-no-symlinks.t
--- a/tests/test-no-symlinks.t
+++ b/tests/test-no-symlinks.t
@@ -64,6 +64,12 @@ Test handling lock corruption
   lock:  free
   wlock: free
 
+  $ cat >> .hg/hgrc <<EOF
+  > [ui]
+  > # for immediate timeout
+  > timeout = 0
+  > EOF
+
 (emulate lock corruption by empty file)
 
   $ printf '' > .hg/wlock
@@ -72,3 +78,33 @@ Test handling lock corruption
   lock:  free
   wlock: user *, but corrupted (*s) (glob)
   [1]
+
+  $ echo foo > foo
+  $ hg add foo
+  waiting for lock on working directory of $TESTTMP/t2, but might be corrupted (glob)
+  abort: working directory of $TESTTMP/t2: timed out waiting for lock, but might be corrupted (glob)
+  (use 'hg debuglocks' to check/unlock corrupted lock)
+  [255]
+
+(check still corrupted)
+
+  $ hg debuglocks
+  lock:  free
+  wlock: user *, but corrupted (*s) (glob)
+  [1]
+
+(check normal waiting/timedout messages for safety, too)
+
+  $ printf '1234' > .hg/wlock
+
+  $ hg add foo
+  waiting for lock on working directory of $TESTTMP/t2 held by '1234' (glob)
+  abort: working directory of $TESTTMP/t2: timed out waiting for lock held by 1234 (glob)
+  [255]
+
+  $ printf 'foobar:1234' > .hg/wlock
+
+  $ hg add foo
+  waiting for lock on working directory of $TESTTMP/t2 held by process '1234' on host 'foobar' (glob)
+  abort: working directory of $TESTTMP/t2: timed out waiting for lock held by foobar:1234 (glob)
+  [255]