[patch] syntax:plain for .hgignore

Guido Ostkamp hg at ostkamp.fastmail.fm
Wed Sep 12 13:32:13 CDT 2007

On Wed, 12 Sep 2007, Matt Mackall wrote:

>> I'm willing to try it with a tweaked Python 2.5.1 build, however I 
>> don't know what to change. The 'configure --help' of Python does not 
>> give any hint.
>> Do you have any hints for me what I need to change to have the regex 
>> module handle larger regular expressions?
> No idea, nor have I had any luck googling for it.

I've checked out the Python 2.5.1 sources and found the following:

The error raised is the following code in .../Python-2.5.1/Modules/_sre.c:

     for (i = 0; i < n; i++) {
         PyObject *o = PyList_GET_ITEM(code, i);
         unsigned long value = PyInt_Check(o) ? (unsigned long)PyInt_AsLong(o)
                                               : PyLong_AsUnsignedLong(o);
         self->code[i] = (SRE_CODE) value;
         if ((unsigned long) self->code[i] != value) {
***         PyErr_SetString(PyExc_OverflowError,
***                         "regular expression code size limit exceeded");

It appears that an 'unsigned long' value 'value' is stored in 
'self->code[i]' which is of type 'unsigned short' because of
Python-2.5.1/Modules/sre.h which defines SRE_CODE as:

/* size of a code word (must be unsigned short or larger, and
    large enough to hold a Py_UNICODE character) */
#define SRE_CODE Py_UCS4
#define SRE_CODE unsigned short

I've changed that last SRE_CODE to become 'unsigned long'. After 
rebuilding Python, I could run your test program successfully.

However, it remains unclear to me, what 'unicode' has to do with the 
general size of an regular expression stack. Maybe this is a general 
Python bug, I don't know.

Interestingly, although your test program gave basically the same results 
that you mentioned (= regex was faster), our 'plain' style patch still 
remains to be faster, even with this Python version.



More information about the Mercurial-devel mailing list