[PATCH STABLE] parsers: raise an unambiguous type during index lookup (issue4451)

Gregory Szorc gregory.szorc at gmail.com
Wed Jun 10 11:50:31 CDT 2015


# HG changeset patch
# User Gregory Szorc <gregory.szorc at gmail.com>
# Date 1433955009 25200
#      Wed Jun 10 09:50:09 2015 -0700
# Branch stable
# Node ID 1b4b923551c0a37c129b1d1c530448685e5ca2a2
# Parent  7298da81f5a9f64ebbdef2b2195585a65da0f99e
parsers: raise an unambiguous type during index lookup (issue4451)

Previously, index lookups would attempt to raise RevlogError when the
lookup failed. This worked in most circumstances. However, in
environments where there could be multiple Python interpreters, this
could fail.

The way we were raising the error was to cache a reference to
RevlogError in a static variable. This assumed that the
"mercurial.error" module was only loaded once and there was only a
single copy of it floating around in memory. Unfortunately, in
some situations - including certain mod_wsgi configurations - this
was not the case: the "mercurial.error" module could be reloaded.
It was possible for a "RevlogError" reference from the first interpreter
to be used by a second interpreter. While the underlying thing
was a "mercurial.error.RevlogError," the object IDs were different, so
the Python code in revlog.py was failing to catch the exception! This
error has existed since the C index lookup code was implemented in
changeset e8d37b78acfb, which was first released in Mercurial 2.2 in
2012.
http://emptysqua.re/blog/python-c-extensions-and-mod-wsgi/#static-variables-are-shared
contains more details.

This patch changes the index lookup code to raise IndexError - a
built-in exception - instead. This side-steps the problem of finding
a reference to a Python-defined type.

The actual exception used shouldn't really matter, as it is only
relevant in a single call site and that site re-raises a completely
different exception. In other words, the exception merely needs to be a
signal.

Because I've seen C extensions get out of sync with the Python code
that interfaces with them, I retained support for catching RevlogError
in the Python code. This is arguably not necessary.

This is my first Python C extension patch. Furthermore, I'm not too
familiar with the revlog code and am not sure if there is more than 1
consumer of the changed C API. Please give extra scrutiny to this patch.

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -1482,45 +1482,8 @@ static int index_find_node(indexObject *
 		return rev;
 	return -2;
 }
 
-static PyObject *raise_revlog_error(void)
-{
-	static PyObject *errclass;
-	PyObject *mod = NULL, *errobj;
-
-	if (errclass == NULL) {
-		PyObject *dict;
-
-		mod = PyImport_ImportModule("mercurial.error");
-		if (mod == NULL)
-			goto classfail;
-
-		dict = PyModule_GetDict(mod);
-		if (dict == NULL)
-			goto classfail;
-
-		errclass = PyDict_GetItemString(dict, "RevlogError");
-		if (errclass == NULL) {
-			PyErr_SetString(PyExc_SystemError,
-					"could not find RevlogError");
-			goto classfail;
-		}
-		Py_INCREF(errclass);
-		Py_DECREF(mod);
-	}
-
-	errobj = PyObject_CallFunction(errclass, NULL);
-	if (errobj == NULL)
-		return NULL;
-	PyErr_SetObject(errclass, errobj);
-	return errobj;
-
-classfail:
-	Py_XDECREF(mod);
-	return NULL;
-}
-
 static PyObject *index_getitem(indexObject *self, PyObject *value)
 {
 	char *node;
 	Py_ssize_t nodelen;
@@ -1533,10 +1496,12 @@ static PyObject *index_getitem(indexObje
 		return NULL;
 	rev = index_find_node(self, node, nodelen);
 	if (rev >= -1)
 		return PyInt_FromLong(rev);
-	if (rev == -2)
-		raise_revlog_error();
+	if (rev == -2) {
+		PyErr_SetNone(PyExc_IndexError);
+		return NULL;
+	}
 	return NULL;
 }
 
 static int nt_partialmatch(indexObject *self, const char *node,
@@ -1593,9 +1558,10 @@ static PyObject *index_partialmatch(inde
 	rev = nt_partialmatch(self, node, nodelen);
 
 	switch (rev) {
 	case -4:
-		raise_revlog_error();
+		PyErr_SetNone(PyExc_IndexError);
+		return NULL;
 	case -3:
 		return NULL;
 	case -2:
 		Py_RETURN_NONE;
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -317,9 +317,11 @@ class revlog(object):
         try:
             return self._nodecache[node]
         except TypeError:
             raise
-        except RevlogError:
+        # Old versions of the C extension raise RevlogError. New versions
+        # raise IndexError.
+        except (IndexError, RevlogError):
             # parsers.c radix tree lookup failed
             raise LookupError(node, self.indexfile, _('no node'))
         except KeyError:
             # pure python cache lookup failed


More information about the Mercurial-devel mailing list