[PATCH 1 of 3 RFC] mercurial: add python re2 bindings
Siddharth Agarwal
sid at less-broken.com
Tue Sep 9 12:07:48 CDT 2014
On 09/09/2014 09:28 AM, Augie Fackler wrote:
> On Tue, Sep 02, 2014 at 02:18:08PM -0700, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0 at fb.com>
>> # Date 1409591324 25200
>> # Mon Sep 01 10:08:44 2014 -0700
>> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
>> # Parent a98f6def97bc4b1bc74e37ca207445639e60b1a5
>> mercurial: add python re2 bindings
> It looks like this is bringing in a package that would otherwise live
> on pypi, is that right?
Actually, no. There are two separate Python bindings for re2, both
called pyre2:
- The bindings written by Facebook available at
https://github.com/facebook/pyre2. These are known to work.
- The bindings at https://pypi.python.org/pypi/re2/ written by someone
else. These are known to be broken, and indeed we have code in util.py
that detects these broken bindings and disables re2 if they're found.
> Is it possible that we could just recommend that packagers make sure
> those bindings are packaged too, rather than embedding a bonus copy in
> mercurial?
Not if they fetch from pypi :/ I discussed this with mpm and he thought
checking the bindings in was reasonable.
>
>> These bindings will enable packagers to build Mercurial with re2 support.
>>
>> The bindings are licensed as 3-clause BSD.
>>
>> I've moved re2.py to mercurial/ to allow 'from mercurial import re2' to work.
>> This is my first time doing this, so it is very likely I got some things wrong.
>>
>> diff --git a/mercurial/pyre2/LICENSE b/mercurial/pyre2/LICENSE
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/LICENSE
>> @@ -0,0 +1,25 @@
>> +Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> +
>> +Redistribution and use in source and binary forms, with or without
>> +modification, are permitted provided that the following conditions
>> +are met:
>> +* Redistributions of source code must retain the above copyright
>> + notice, this list of conditions and the following disclaimer.
>> +* Redistributions in binary form must reproduce the above copyright
>> + notice, this list of conditions and the following disclaimer in the
>> + documentation and/or other materials provided with the distribution.
>> +* Neither the name of Facebook nor the names of its contributors
>> + may be used to endorse or promote products derived from this software
>> + without specific prior written permission.
>> +
>> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> diff --git a/mercurial/pyre2/README.rst b/mercurial/pyre2/README.rst
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/README.rst
>> @@ -0,0 +1,71 @@
>> +=====
>> +pyre2
>> +=====
>> +
>> +.. contents::
>> +
>> +Summary
>> +=======
>> +
>> +pyre2 is a Python extension that wraps
>> +`Google's RE2 regular expression library
>> +<http://code.google.com/p/re2/>`_.
>> +It implements many of the features of Python's built-in
>> +``re`` module with compatible interfaces.
>> +
>> +
>> +New Features
>> +============
>> +
>> +* ``Regexp`` objects have a ``fullmatch`` method that works like ``match``,
>> + but anchors the match at both the start and the end.
>> +* ``Regexp`` objects have
>> + ``test_search``, ``test_match``, and ``test_fullmatch``
>> + methods that work like ``search``, ``match``, and ``fullmatch``,
>> + but only return ``True`` or ``False`` to indicate
>> + whether the match was successful.
>> + These methods should be faster than the full versions,
>> + especially for patterns with capturing groups.
>> +
>> +
>> +Missing Features
>> +================
>> +
>> +* No substitution methods.
>> +* No flags.
>> +* No ``split``, ``findall``, or ``finditer``.
>> +* No top-level convenience functions like ``search`` and ``match``.
>> + (Just use compile.)
>> +* No compile cache.
>> + (If you care enough about performance to use RE2,
>> + you probably care enough to cache your own patterns.)
>> +* No ``lastindex`` or ``lastgroup`` on ``Match`` objects.
>> +
>> +
>> +Current Status
>> +==============
>> +
>> +pyre2 has only received basic testing,
>> +and I am by no means a Python extension expert,
>> +so it is quite possible that it contains bugs.
>> +I'd guess the most likely are reference leaks in error cases.
>> +
>> +RE2 doesn't build with fPIC, so I had to bulid it with
>> +
>> +::
>> +
>> + make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.'
>> +
>> +I also had to add it to my compiler search path when building the module
>> +with a command like
>> +
>> +::
>> +
>> + env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build
>> +
>> +
>> +Contact
>> +=======
>> +
>> +You can file bug reports on GitHub, or email the author:
>> +David Reiss <dreiss at facebook.com>.
>> diff --git a/mercurial/pyre2/_re2.cc b/mercurial/pyre2/_re2.cc
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/_re2.cc
>> @@ -0,0 +1,753 @@
>> +/*
>> + * Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * * Redistributions of source code must retain the above copyright
>> + * notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above copyright
>> + * notice, this list of conditions and the following disclaimer in the
>> + * documentation and/or other materials provided with the distribution.
>> + * * Neither the name of Facebook nor the names of its contributors
>> + * may be used to endorse or promote products derived from this software
>> + * without specific prior written permission.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +#define PY_SSIZE_T_CLEAN
>> +#include <Python.h>
>> +
>> +#include <cstddef>
>> +
>> +#include <string>
>> +#include <new>
>> +using std::nothrow;
>> +
>> +#include <re2/re2.h>
>> +using re2::RE2;
>> +using re2::StringPiece;
>> +
>> +
>> +typedef struct _RegexpObject2 {
>> + PyObject_HEAD
>> + // __dict__. Simpler than implementing getattr and possibly faster.
>> + PyObject* attr_dict;
>> + RE2* re2_obj;
>> +} RegexpObject2;
>> +
>> +typedef struct _MatchObject2 {
>> + PyObject_HEAD
>> + // __dict__. Simpler than implementing getattr and possibly faster.
>> + PyObject* attr_dict;
>> + // Cache of __dict__["re"] and __dict__["string", which are used for group()
>> + // calls. These fields do *not* own their own references. They piggyback on
>> + // the references in attr_dict.
>> + PyObject* re;
>> + PyObject* string;
>> + // There are several possible approaches to storing the matched groups:
>> + // 1. Fully materialize the groups tuple at match time.
>> + // 2. Cache allocate PyString objects when groups are requested.
>> + // 3. Always allocate new PyStrings on demand.
>> + // I've chosen to go with #3. It's the simplest, and I'm pretty sure it's
>> + // optimal in all cases where no group is fetched more than once.
>> + StringPiece* groups;
>> +} MatchObject2;
>> +
>> +
>> +// Imported from sre_constants.
>> +static PyObject* error_class;
>> +
>> +
>> +// Forward declarations of methods, creators, and destructors.
>> +static void regexp_dealloc(RegexpObject2* self);
>> +static PyObject* create_regexp(PyObject* pattern);
>> +static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static void match_dealloc(MatchObject2* self);
>> +static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups);
>> +static PyObject* match_group(MatchObject2* self, PyObject* args);
>> +static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* match_start(MatchObject2* self, PyObject* args);
>> +static PyObject* match_end(MatchObject2* self, PyObject* args);
>> +static PyObject* match_span(MatchObject2* self, PyObject* args);
>> +
>> +
>> +static PyMethodDef regexp_methods[] = {
>> + {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS,
>> + "search(string[, pos[, endpos]]) --> match object or None.\n"
>> + " Scan through string looking for a match, and return a corresponding\n"
>> + " MatchObject instance. Return None if no position in the string matches."
>> + },
>> + {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS,
>> + "match(string[, pos[, endpos]]) --> match object or None.\n"
>> + " Matches zero or more characters at the beginning of the string"
>> + },
>> + {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS,
>> + "fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
>> + " Matches the entire string"
>> + },
>> + {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS,
>> + "test_search(string[, pos[, endpos]]) --> bool.\n"
>> + " Like 'search', but only returns whether a match was found."
>> + },
>> + {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS,
>> + "test_match(string[, pos[, endpos]]) --> match object or None.\n"
>> + " Like 'match', but only returns whether a match was found."
>> + },
>> + {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS,
>> + "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
>> + " Like 'fullmatch', but only returns whether a match was found."
>> + },
>> + {NULL} /* Sentinel */
>> +};
>> +
>> +static PyMethodDef match_methods[] = {
>> + {"group", (PyCFunction)match_group, METH_VARARGS,
>> + NULL
>> + },
>> + {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS,
>> + NULL
>> + },
>> + {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS,
>> + NULL
>> + },
>> + {"start", (PyCFunction)match_start, METH_VARARGS,
>> + NULL
>> + },
>> + {"end", (PyCFunction)match_end, METH_VARARGS,
>> + NULL
>> + },
>> + {"span", (PyCFunction)match_span, METH_VARARGS,
>> + NULL
>> + },
>> + {NULL} /* Sentinel */
>> +};
>> +
>> +
>> +// Simple method to block setattr.
>> +static int
>> +_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
>> + (void)name;
>> + (void)v;
>> + PyErr_Format(PyExc_AttributeError,
>> + "'%s' object attributes are read-only",
>> + obj->ob_type->tp_name);
>> + return -1;
>> +}
>> +
>> +
>> +static PyTypeObject Regexp_Type2 = {
>> + PyObject_HEAD_INIT(NULL)
>> + 0, /*ob_size*/
>> + "_re2.RE2_Regexp", /*tp_name*/
>> + sizeof(RegexpObject2), /*tp_basicsize*/
>> + 0, /*tp_itemsize*/
>> + (destructor)regexp_dealloc, /*tp_dealloc*/
>> + 0, /*tp_print*/
>> + 0, /*tp_getattr*/
>> + 0, /*tp_setattr*/
>> + 0, /*tp_compare*/
>> + 0, /*tp_repr*/
>> + 0, /*tp_as_number*/
>> + 0, /*tp_as_sequence*/
>> + 0, /*tp_as_mapping*/
>> + 0, /*tp_hash*/
>> + 0, /*tp_call*/
>> + 0, /*tp_str*/
>> + 0, /*tp_getattro*/
>> + _no_setattr, /*tp_setattro*/
>> + 0, /*tp_as_buffer*/
>> + Py_TPFLAGS_DEFAULT, /*tp_flags*/
>> + "RE2 regexp objects", /*tp_doc*/
>> + 0, /*tp_traverse*/
>> + 0, /*tp_clear*/
>> + 0, /*tp_richcompare*/
>> + 0, /*tp_weaklistoffset*/
>> + 0, /*tp_iter*/
>> + 0, /*tp_iternext*/
>> + regexp_methods, /*tp_methods*/
>> + 0, /*tp_members*/
>> + 0, /*tp_getset*/
>> + 0, /*tp_base*/
>> + 0, /*tp_dict*/
>> + 0, /*tp_descr_get*/
>> + 0, /*tp_descr_set*/
>> + offsetof(RegexpObject2, attr_dict), /*tp_dictoffset*/
>> + 0, /*tp_init*/
>> + 0, /*tp_alloc*/
>> + 0, /*tp_new*/
>> +};
>> +
>> +static PyTypeObject Match_Type2 = {
>> + PyObject_HEAD_INIT(NULL)
>> + 0, /*ob_size*/
>> + "_re2.RE2_Match", /*tp_name*/
>> + sizeof(MatchObject2), /*tp_basicsize*/
>> + 0, /*tp_itemsize*/
>> + (destructor)match_dealloc, /*tp_dealloc*/
>> + 0, /*tp_print*/
>> + 0, /*tp_getattr*/
>> + 0, /*tp_setattr*/
>> + 0, /*tp_compare*/
>> + 0, /*tp_repr*/
>> + 0, /*tp_as_number*/
>> + 0, /*tp_as_sequence*/
>> + 0, /*tp_as_mapping*/
>> + 0, /*tp_hash*/
>> + 0, /*tp_call*/
>> + 0, /*tp_str*/
>> + 0, /*tp_getattro*/
>> + _no_setattr, /*tp_setattro*/
>> + 0, /*tp_as_buffer*/
>> + Py_TPFLAGS_DEFAULT, /*tp_flags*/
>> + "RE2 match objects", /*tp_doc*/
>> + 0, /*tp_traverse*/
>> + 0, /*tp_clear*/
>> + 0, /*tp_richcompare*/
>> + 0, /*tp_weaklistoffset*/
>> + 0, /*tp_iter*/
>> + 0, /*tp_iternext*/
>> + match_methods, /*tp_methods*/
>> + 0, /*tp_members*/
>> + 0, /*tp_getset*/
>> + 0, /*tp_base*/
>> + 0, /*tp_dict*/
>> + 0, /*tp_descr_get*/
>> + 0, /*tp_descr_set*/
>> + offsetof(MatchObject2, attr_dict), /*tp_dictoffset*/
>> + 0, /*tp_init*/
>> + 0, /*tp_alloc*/
>> + 0, /*tp_new*/
>> +};
>> +
>> +
>> +static void
>> +regexp_dealloc(RegexpObject2* self)
>> +{
>> + delete self->re2_obj;
>> + Py_XDECREF(self->attr_dict);
>> + PyObject_Del(self);
>> +}
>> +
>> +static PyObject*
>> +create_regexp(PyObject* pattern)
>> +{
>> + RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2);
>> + if (regexp == NULL) {
>> + return NULL;
>> + }
>> + regexp->re2_obj = NULL;
>> + regexp->attr_dict = NULL;
>> +
>> + const char* raw_pattern = PyString_AS_STRING(pattern);
>> + Py_ssize_t len_pattern = PyString_GET_SIZE(pattern);
>> +
>> + RE2::Options options;
>> + options.set_log_errors(false);
>> +
>> + regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int) len_pattern), options);
>> +
>> + if (regexp->re2_obj == NULL) {
>> + PyErr_NoMemory();
>> + Py_DECREF(regexp);
>> + return NULL;
>> + }
>> +
>> + if (!regexp->re2_obj->ok()) {
>> + long code = (long)regexp->re2_obj->error_code();
>> + const std::string& msg = regexp->re2_obj->error();
>> + PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length());
>> + if (value == NULL) {
>> + Py_DECREF(regexp);
>> + return NULL;
>> + }
>> + PyErr_SetObject(error_class, value);
>> + Py_DECREF(regexp);
>> + return NULL;
>> + }
>> +
>> + PyObject* groupindex = PyDict_New();
>> + if (groupindex == NULL) {
>> + Py_DECREF(regexp);
>> + return NULL;
>> + }
>> +
>> + // Build up the attr_dict early so regexp can take ownership of our reference
>> + // to groupindex.
>> + regexp->attr_dict = Py_BuildValue("{sisNsO}",
>> + "groups", regexp->re2_obj->NumberOfCapturingGroups(),
>> + "groupindex", groupindex,
>> + "pattern", pattern);
>> + if (regexp->attr_dict == NULL) {
>> + Py_DECREF(regexp);
>> + return NULL;
>> + }
>> +
>> + const std::map<std::string, int>& name_map = regexp->re2_obj->NamedCapturingGroups();
>> + for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
>> + PyObject* index = PyInt_FromLong(it->second);
>> + if (index == NULL) {
>> + Py_DECREF(regexp);
>> + return NULL;
>> + }
>> + int res = PyDict_SetItemString(groupindex, it->first.c_str(), index);
>> + Py_DECREF(index);
>> + if (res < 0) {
>> + Py_DECREF(regexp);
>> + return NULL;
>> + }
>> + }
>> +
>> + return (PyObject*)regexp;
>> +}
>> +
>> +static PyObject*
>> +_do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match)
>> +{
>> + PyObject* string;
>> + const char* subject;
>> + Py_ssize_t slen;
>> + long pos = 0;
>> + long endpos = LONG_MAX;
>> +
>> + static const char* kwlist[] = {
>> + "string",
>> + "pos",
>> + "endpos",
>> + NULL};
>> +
>> + // Using O! instead of s# here, because we want to stash the original
>> + // PyObject* in the match object on a successful match.
>> + if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!|ll", (char**)kwlist,
>> + &PyString_Type, &string,
>> + &pos, &endpos)) {
>> + return NULL;
>> + }
>> +
>> + subject = PyString_AS_STRING(string);
>> + slen = PyString_GET_SIZE(string);
>> + if (pos < 0) pos = 0;
>> + if (pos > slen) pos = slen;
>> + if (endpos < pos) endpos = pos;
>> + if (endpos > slen) endpos = slen;
>> +
>> + // Don't bother allocating these if we are just doing a test.
>> + int n_groups = 0;
>> + StringPiece* groups = NULL;
>> + if (return_match) {
>> + n_groups = self->re2_obj->NumberOfCapturingGroups() + 1;
>> + groups = new(nothrow) StringPiece[n_groups];
>> +
>> + if (groups == NULL) {
>> + PyErr_NoMemory();
>> + return NULL;
>> + }
>> + }
>> +
>> + bool matched = self->re2_obj->Match(
>> + StringPiece(subject, (int) slen),
>> + (int) pos,
>> + (int) endpos,
>> + anchor,
>> + groups,
>> + n_groups);
>> +
>> + if (!return_match) {
>> + if (matched) {
>> + Py_RETURN_TRUE;
>> + }
>> + Py_RETURN_FALSE;
>> + }
>> +
>> + if (!matched) {
>> + delete[] groups;
>> + Py_RETURN_NONE;
>> + }
>> +
>> + // create_match is going to Py_BuildValue the pos and endpos into
>> + // PyObjects. We could optimize the case where pos and/or endpos were
>> + // explicitly passed in by forwarding the existing PyObjects.
>> + // That requires much more intricate code, though.
>> + return create_match((PyObject*)self, string, pos, endpos, groups);
>> +}
>> +
>> +static PyObject*
>> +regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + return _do_search(self, args, kwds, RE2::UNANCHORED, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + return _do_search(self, args, kwds, RE2::ANCHOR_START, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + return _do_search(self, args, kwds, RE2::UNANCHORED, false);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + return _do_search(self, args, kwds, RE2::ANCHOR_START, false);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false);
>> +}
>> +
>> +
>> +static void
>> +match_dealloc(MatchObject2* self)
>> +{
>> + delete[] self->groups;
>> + Py_XDECREF(self->attr_dict);
>> + PyObject_Del(self);
>> +}
>> +
>> +static PyObject*
>> +create_match(PyObject* re, PyObject* string,
>> + long pos, long endpos,
>> + StringPiece* groups)
>> +{
>> + MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
>> + if (match == NULL) {
>> + delete[] groups;
>> + return NULL;
>> + }
>> + match->attr_dict = NULL;
>> + match->groups = groups;
>> + match->re = re;
>> + match->string = string;
>> +
>> + match->attr_dict = Py_BuildValue("{sOsOslsl}",
>> + "re", re,
>> + "string", string,
>> + "pos", pos,
>> + "endpos", endpos);
>> + if (match->attr_dict == NULL) {
>> + Py_DECREF(match);
>> + return NULL;
>> + }
>> +
>> + return (PyObject*)match;
>> +}
>> +
>> +/**
>> + * Attempt to convert an untrusted group index (PyObject* group) into
>> + * a trusted one (*idx_p). Return false on failure (exception).
>> + */
>> +static bool
>> +_group_idx(MatchObject2* self, PyObject* group, long* idx_p)
>> +{
>> + if (group == NULL) {
>> + return false;
>> + }
>> + PyErr_Clear(); // Is this necessary?
>> + long idx = PyInt_AsLong(group);
>> + if (idx == -1 && PyErr_Occurred() != NULL) {
>> + return false;
>> + }
>> + // TODO: Consider caching NumberOfCapturingGroups.
>> + if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) {
>> + PyErr_SetString(PyExc_IndexError, "no such group");
>> + return false;
>> + }
>> + *idx_p = idx;
>> + return true;
>> +}
>> +
>> +/**
>> + * Extract the start and end indexes of a pre-checked group number.
>> + * Sets both to -1 if it did not participate in the match.
>> + */
>> +static bool
>> +_group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end)
>> +{
>> + // "idx" is expected to be verified.
>> + StringPiece& piece = self->groups[idx];
>> + if (piece.data() == NULL) {
>> + *o_start = -1;
>> + *o_end = -1;
>> + return false;
>> + }
>> + Py_ssize_t start = piece.data() - PyString_AS_STRING(self->string);
>> + *o_start = start;
>> + *o_end = start + piece.length();
>> + return true;
>> +}
>> +
>> +/**
>> + * Return a pre-checked group number as a string, or default_obj
>> + * if it didn't participate in the match.
>> + */
>> +static PyObject*
>> +_group_get_i(MatchObject2* self, long idx, PyObject* default_obj)
>> +{
>> + Py_ssize_t start;
>> + Py_ssize_t end;
>> + if (!_group_span(self, idx, &start, &end)) {
>> + Py_INCREF(default_obj);
>> + return default_obj;
>> + }
>> + return PySequence_GetSlice(self->string, start, end);
>> +}
>> +
>> +/**
>> + * Return n un-checked group number as a string.
>> + */
>> +static PyObject*
>> +_group_get_o(MatchObject2* self, PyObject* group)
>> +{
>> + long idx;
>> + if (!_group_idx(self, group, &idx)) {
>> + return NULL;
>> + }
>> + return _group_get_i(self, idx, Py_None);
>> +}
>> +
>> +
>> +static PyObject*
>> +match_group(MatchObject2* self, PyObject* args)
>> +{
>> + long idx = 0;
>> + Py_ssize_t nargs = PyTuple_GET_SIZE(args);
>> + switch (nargs) {
>> + case 1:
>> + if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) {
>> + return NULL;
>> + }
>> + // Fall through.
>> + case 0:
>> + return _group_get_i(self, idx, Py_None);
>> + default:
>> + PyObject* ret = PyTuple_New(nargs);
>> + if (ret == NULL) {
>> + return NULL;
>> + }
>> +
>> + for (int i = 0; i < nargs; i++) {
>> + PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i));
>> + if (group == NULL) {
>> + Py_DECREF(ret);
>> + return NULL;
>> + }
>> + PyTuple_SET_ITEM(ret, i, group);
>> + }
>> + return ret;
>> + }
>> +}
>> +
>> +static PyObject*
>> +match_groups(MatchObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + static const char* kwlist[] = {
>> + "default",
>> + NULL};
>> +
>> + PyObject* default_obj = Py_None;
>> +
>> + if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
>> + &default_obj)) {
>> + return NULL;
>> + }
>> +
>> + int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups();
>> +
>> + PyObject* ret = PyTuple_New(ngroups);
>> + if (ret == NULL) {
>> + return NULL;
>> + }
>> +
>> + for (int i = 1; i <= ngroups; i++) {
>> + PyObject* group = _group_get_i(self, i, default_obj);
>> + if (group == NULL) {
>> + Py_DECREF(ret);
>> + return NULL;
>> + }
>> + PyTuple_SET_ITEM(ret, i-1, group);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static PyObject*
>> +match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> + static const char* kwlist[] = {
>> + "default",
>> + NULL};
>> +
>> + PyObject* default_obj = Py_None;
>> +
>> + if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
>> + &default_obj)) {
>> + return NULL;
>> + }
>> +
>> + PyObject* ret = PyDict_New();
>> + if (ret == NULL) {
>> + return NULL;
>> + }
>> +
>> + const std::map<std::string, int>& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups();
>> + for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
>> + PyObject* group = _group_get_i(self, it->second, default_obj);
>> + if (group == NULL) {
>> + Py_DECREF(ret);
>> + return NULL;
>> + }
>> + // TODO: Group names with embedded zeroes?
>> + int res = PyDict_SetItemString(ret, it->first.data(), group);
>> + Py_DECREF(group);
>> + if (res < 0) {
>> + Py_DECREF(ret);
>> + return NULL;
>> + }
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +enum span_mode_t { START, END, SPAN };
>> +
>> +static PyObject*
>> +_do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode)
>> +{
>> + long idx = 0;
>> + PyObject* group = NULL;
>> + if (!PyArg_UnpackTuple(args, name, 0, 1,
>> + &group)) {
>> + return NULL;
>> + }
>> + if (group != NULL) {
>> + if (!_group_idx(self, group, &idx)) {
>> + return NULL;
>> + }
>> + }
>> +
>> + Py_ssize_t start = - 1;
>> + Py_ssize_t end = - 1;
>> +
>> + (void)_group_span(self, idx, &start, &end);
>> + switch (mode) {
>> + case START : return Py_BuildValue("n", start );
>> + case END : return Py_BuildValue("n", end );
>> + case SPAN:
>> + return Py_BuildValue("nn", start, end);
>> + }
>> +
>> + // Make gcc happy.
>> + return NULL;
>> +}
>> +
>> +static PyObject*
>> +match_start(MatchObject2* self, PyObject* args)
>> +{
>> + return _do_span(self, args, "start", START);
>> +}
>> +
>> +static PyObject*
>> +match_end(MatchObject2* self, PyObject* args)
>> +{
>> + return _do_span(self, args, "end", END);
>> +}
>> +
>> +static PyObject*
>> +match_span(MatchObject2* self, PyObject* args)
>> +{
>> + return _do_span(self, args, "span", SPAN);
>> +}
>> +
>> +
>> +static PyObject*
>> +_compile(PyObject* self, PyObject* args, PyObject* kwds)
>> +{
>> + static const char* kwlist[] = {
>> + "pattern",
>> + NULL};
>> +
>> + PyObject* pattern;
>> +
>> + if (!PyArg_ParseTupleAndKeywords(args, kwds, "O", (char**)kwlist,
>> + &pattern)) {
>> + return NULL;
>> + }
>> +
>> + return create_regexp(pattern);
>> +}
>> +
>> +static PyObject*
>> +escape(PyObject* self, PyObject* args)
>> +{
>> + char *str;
>> + Py_ssize_t len;
>> +
>> + if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) {
>> + return NULL;
>> + }
>> +
>> + std::string esc(RE2::QuoteMeta(StringPiece(str, (int) len)));
>> +
>> + return PyString_FromStringAndSize(esc.c_str(), esc.size());
>> +}
>> +
>> +static PyMethodDef methods[] = {
>> + {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL},
>> + {"escape", (PyCFunction)escape, METH_VARARGS,
>> + "Escape all potentially meaningful regexp characters."},
>> + {NULL} /* Sentinel */
>> +};
>> +
>> +PyMODINIT_FUNC
>> +init_re2(void)
>> +{
>> + if (PyType_Ready(&Regexp_Type2) < 0) {
>> + return;
>> + }
>> +
>> + if (PyType_Ready(&Match_Type2) < 0) {
>> + return;
>> + }
>> +
>> + PyObject* sre_mod = PyImport_ImportModuleNoBlock("sre_constants");
>> + if (sre_mod == NULL) {
>> + return;
>> + }
>> + /* static global */ error_class = PyObject_GetAttrString(sre_mod, "error");
>> + if (error_class == NULL) {
>> + return;
>> + }
>> +
>> + PyObject* mod = Py_InitModule("_re2", methods);
>> +
>> + Py_INCREF(error_class);
>> + PyModule_AddObject(mod, "error", error_class);
>> +}
>> diff --git a/mercurial/re2.py b/mercurial/re2.py
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/re2.py
>> @@ -0,0 +1,63 @@
>> +#!/usr/bin/env python
>> +
>> +# Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> +#
>> +# Redistribution and use in source and binary forms, with or without
>> +# modification, are permitted provided that the following conditions
>> +# are met:
>> +# * Redistributions of source code must retain the above copyright
>> +# notice, this list of conditions and the following disclaimer.
>> +# * Redistributions in binary form must reproduce the above copyright
>> +# notice, this list of conditions and the following disclaimer in the
>> +# documentation and/or other materials provided with the distribution.
>> +# * Neither the name of Facebook nor the names of its contributors
>> +# may be used to endorse or promote products derived from this software
>> +# without specific prior written permission.
>> +#
>> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> +
>> +import _re2
>> +
>> +__all__ = [
>> + "error",
>> + "escape",
>> + "compile",
>> + "search",
>> + "match",
>> + "fullmatch",
>> + ]
>> +
>> +# Module-private compilation function, for future caching, other enhancements
>> +_compile = _re2._compile
>> +
>> +error = _re2.error
>> +escape = _re2.escape
>> +
>> +def compile(pattern):
>> + "Compile a regular expression pattern, returning a pattern object."
>> + return _compile(pattern)
>> +
>> +def search(pattern, string):
>> + """Scan through string looking for a match to the pattern, returning
>> + a match object, or None if no match was found."""
>> + return _compile(pattern).search(string)
>> +
>> +def match(pattern, string):
>> + """Try to apply the pattern at the start of the string, returning
>> + a match object, or None if no match was found."""
>> + return _compile(pattern).match(string)
>> +
>> +def fullmatch(pattern, string):
>> + """Try to apply the pattern to the entire string, returning
>> + a match object, or None if no match was found."""
>> + return _compile(pattern).fullmatch(string)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list