[PATCH 1 of 3 RFC] mercurial: add python re2 bindings

Siddharth Agarwal sid at less-broken.com
Tue Sep 9 12:07:48 CDT 2014


On 09/09/2014 09:28 AM, Augie Fackler wrote:
> On Tue, Sep 02, 2014 at 02:18:08PM -0700, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0 at fb.com>
>> # Date 1409591324 25200
>> #      Mon Sep 01 10:08:44 2014 -0700
>> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
>> # Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
>> mercurial: add python re2 bindings
> It looks like this is bringing in a package that would otherwise live
> on pypi, is that right?

Actually, no. There are two separate Python bindings for re2, both 
called pyre2:

- The bindings written by Facebook available at 
https://github.com/facebook/pyre2. These are known to work.
- The bindings at https://pypi.python.org/pypi/re2/ written by someone 
else. These are known to be broken, and indeed we have code in util.py 
that detects these broken bindings and disables re2 if they're found.

> Is it possible that we could just recommend that packagers make sure
> those bindings are packaged too, rather than embedding a bonus copy in
> mercurial?

Not if they fetch from pypi :/ I discussed this with mpm and he thought 
checking the bindings in was reasonable.

>
>> These bindings will enable packagers to build Mercurial with re2 support.
>>
>> The bindings are licensed as 3-clause BSD.
>>
>> I've moved re2.py to mercurial/ to allow 'from mercurial import re2' to work.
>> This is my first time doing this, so it is very likely I got some things wrong.
>>
>> diff --git a/mercurial/pyre2/LICENSE b/mercurial/pyre2/LICENSE
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/LICENSE
>> @@ -0,0 +1,25 @@
>> +Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> +
>> +Redistribution and use in source and binary forms, with or without
>> +modification, are permitted provided that the following conditions
>> +are met:
>> +* Redistributions of source code must retain the above copyright
>> +  notice, this list of conditions and the following disclaimer.
>> +* Redistributions in binary form must reproduce the above copyright
>> +  notice, this list of conditions and the following disclaimer in the
>> +  documentation and/or other materials provided with the distribution.
>> +* Neither the name of Facebook nor the names of its contributors
>> +  may be used to endorse or promote products derived from this software
>> +  without specific prior written permission.
>> +
>> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> diff --git a/mercurial/pyre2/README.rst b/mercurial/pyre2/README.rst
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/README.rst
>> @@ -0,0 +1,71 @@
>> +=====
>> +pyre2
>> +=====
>> +
>> +.. contents::
>> +
>> +Summary
>> +=======
>> +
>> +pyre2 is a Python extension that wraps
>> +`Google's RE2 regular expression library
>> +<http://code.google.com/p/re2/>`_.
>> +It implements many of the features of Python's built-in
>> +``re`` module with compatible interfaces.
>> +
>> +
>> +New Features
>> +============
>> +
>> +* ``Regexp`` objects have a ``fullmatch`` method that works like ``match``,
>> +  but anchors the match at both the start and the end.
>> +* ``Regexp`` objects have
>> +  ``test_search``, ``test_match``, and ``test_fullmatch``
>> +  methods that work like ``search``, ``match``, and ``fullmatch``,
>> +  but only return ``True`` or ``False`` to indicate
>> +  whether the match was successful.
>> +  These methods should be faster than the full versions,
>> +  especially for patterns with capturing groups.
>> +
>> +
>> +Missing Features
>> +================
>> +
>> +* No substitution methods.
>> +* No flags.
>> +* No ``split``, ``findall``, or ``finditer``.
>> +* No top-level convenience functions like ``search`` and ``match``.
>> +  (Just use compile.)
>> +* No compile cache.
>> +  (If you care enough about performance to use RE2,
>> +  you probably care enough to cache your own patterns.)
>> +* No ``lastindex`` or ``lastgroup`` on ``Match`` objects.
>> +
>> +
>> +Current Status
>> +==============
>> +
>> +pyre2 has only received basic testing,
>> +and I am by no means a Python extension expert,
>> +so it is quite possible that it contains bugs.
>> +I'd guess the most likely are reference leaks in error cases.
>> +
>> +RE2 doesn't build with fPIC, so I had to bulid it with
>> +
>> +::
>> +
>> +  make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.'
>> +
>> +I also had to add it to my compiler search path when building the module
>> +with a command like
>> +
>> +::
>> +
>> +  env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build
>> +
>> +
>> +Contact
>> +=======
>> +
>> +You can file bug reports on GitHub, or email the author:
>> +David Reiss <dreiss at facebook.com>.
>> diff --git a/mercurial/pyre2/_re2.cc b/mercurial/pyre2/_re2.cc
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/_re2.cc
>> @@ -0,0 +1,753 @@
>> +/*
>> + * Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * * Redistributions of source code must retain the above copyright
>> + *   notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above copyright
>> + *   notice, this list of conditions and the following disclaimer in the
>> + *   documentation and/or other materials provided with the distribution.
>> + * * Neither the name of Facebook nor the names of its contributors
>> + *   may be used to endorse or promote products derived from this software
>> + *   without specific prior written permission.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +#define PY_SSIZE_T_CLEAN
>> +#include <Python.h>
>> +
>> +#include <cstddef>
>> +
>> +#include <string>
>> +#include <new>
>> +using std::nothrow;
>> +
>> +#include <re2/re2.h>
>> +using re2::RE2;
>> +using re2::StringPiece;
>> +
>> +
>> +typedef struct _RegexpObject2 {
>> +  PyObject_HEAD
>> +  // __dict__.  Simpler than implementing getattr and possibly faster.
>> +  PyObject* attr_dict;
>> +  RE2* re2_obj;
>> +} RegexpObject2;
>> +
>> +typedef struct _MatchObject2 {
>> +  PyObject_HEAD
>> +  // __dict__.  Simpler than implementing getattr and possibly faster.
>> +  PyObject* attr_dict;
>> +  // Cache of __dict__["re"] and __dict__["string", which are used for group()
>> +  // calls. These fields do *not* own their own references.  They piggyback on
>> +  // the references in attr_dict.
>> +  PyObject* re;
>> +  PyObject* string;
>> +  // There are several possible approaches to storing the matched groups:
>> +  // 1. Fully materialize the groups tuple at match time.
>> +  // 2. Cache allocate PyString objects when groups are requested.
>> +  // 3. Always allocate new PyStrings on demand.
>> +  // I've chosen to go with #3.  It's the simplest, and I'm pretty sure it's
>> +  // optimal in all cases where no group is fetched more than once.
>> +  StringPiece* groups;
>> +} MatchObject2;
>> +
>> +
>> +// Imported from sre_constants.
>> +static PyObject* error_class;
>> +
>> +
>> +// Forward declarations of methods, creators, and destructors.
>> +static void regexp_dealloc(RegexpObject2* self);
>> +static PyObject* create_regexp(PyObject* pattern);
>> +static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static void match_dealloc(MatchObject2* self);
>> +static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups);
>> +static PyObject* match_group(MatchObject2* self, PyObject* args);
>> +static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* match_start(MatchObject2* self, PyObject* args);
>> +static PyObject* match_end(MatchObject2* self, PyObject* args);
>> +static PyObject* match_span(MatchObject2* self, PyObject* args);
>> +
>> +
>> +static PyMethodDef regexp_methods[] = {
>> +  {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS,
>> +    "search(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Scan through string looking for a match, and return a corresponding\n"
>> +    "    MatchObject instance. Return None if no position in the string matches."
>> +  },
>> +  {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS,
>> +    "match(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Matches zero or more characters at the beginning of the string"
>> +  },
>> +  {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS,
>> +    "fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Matches the entire string"
>> +  },
>> +  {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS,
>> +    "test_search(string[, pos[, endpos]]) --> bool.\n"
>> +    "    Like 'search', but only returns whether a match was found."
>> +  },
>> +  {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS,
>> +    "test_match(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Like 'match', but only returns whether a match was found."
>> +  },
>> +  {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS,
>> +    "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Like 'fullmatch', but only returns whether a match was found."
>> +  },
>> +  {NULL}  /* Sentinel */
>> +};
>> +
>> +static PyMethodDef match_methods[] = {
>> +  {"group", (PyCFunction)match_group, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS,
>> +    NULL
>> +  },
>> +  {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS,
>> +    NULL
>> +  },
>> +  {"start", (PyCFunction)match_start, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {"end", (PyCFunction)match_end, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {"span", (PyCFunction)match_span, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {NULL}  /* Sentinel */
>> +};
>> +
>> +
>> +// Simple method to block setattr.
>> +static int
>> +_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
>> +  (void)name;
>> +  (void)v;
>> +	PyErr_Format(PyExc_AttributeError,
>> +      "'%s' object attributes are read-only",
>> +      obj->ob_type->tp_name);
>> +  return -1;
>> +}
>> +
>> +
>> +static PyTypeObject Regexp_Type2 = {
>> +  PyObject_HEAD_INIT(NULL)
>> +  0,                           /*ob_size*/
>> +  "_re2.RE2_Regexp",           /*tp_name*/
>> +  sizeof(RegexpObject2),       /*tp_basicsize*/
>> +  0,                           /*tp_itemsize*/
>> +  (destructor)regexp_dealloc,  /*tp_dealloc*/
>> +  0,                           /*tp_print*/
>> +  0,                           /*tp_getattr*/
>> +  0,                           /*tp_setattr*/
>> +  0,                           /*tp_compare*/
>> +  0,                           /*tp_repr*/
>> +  0,                           /*tp_as_number*/
>> +  0,                           /*tp_as_sequence*/
>> +  0,                           /*tp_as_mapping*/
>> +  0,                           /*tp_hash*/
>> +  0,                           /*tp_call*/
>> +  0,                           /*tp_str*/
>> +  0,                           /*tp_getattro*/
>> +  _no_setattr,                 /*tp_setattro*/
>> +  0,                           /*tp_as_buffer*/
>> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
>> +  "RE2 regexp objects",        /*tp_doc*/
>> +  0,                           /*tp_traverse*/
>> +  0,                           /*tp_clear*/
>> +  0,                           /*tp_richcompare*/
>> +  0,                           /*tp_weaklistoffset*/
>> +  0,                           /*tp_iter*/
>> +  0,                           /*tp_iternext*/
>> +  regexp_methods,              /*tp_methods*/
>> +  0,                           /*tp_members*/
>> +  0,                           /*tp_getset*/
>> +  0,                           /*tp_base*/
>> +  0,                           /*tp_dict*/
>> +  0,                           /*tp_descr_get*/
>> +  0,                           /*tp_descr_set*/
>> +  offsetof(RegexpObject2, attr_dict),  /*tp_dictoffset*/
>> +  0,                           /*tp_init*/
>> +  0,                           /*tp_alloc*/
>> +  0,                           /*tp_new*/
>> +};
>> +
>> +static PyTypeObject Match_Type2 = {
>> +  PyObject_HEAD_INIT(NULL)
>> +  0,                           /*ob_size*/
>> +  "_re2.RE2_Match",            /*tp_name*/
>> +  sizeof(MatchObject2),        /*tp_basicsize*/
>> +  0,                           /*tp_itemsize*/
>> +  (destructor)match_dealloc,   /*tp_dealloc*/
>> +  0,                           /*tp_print*/
>> +  0,                           /*tp_getattr*/
>> +  0,                           /*tp_setattr*/
>> +  0,                           /*tp_compare*/
>> +  0,                           /*tp_repr*/
>> +  0,                           /*tp_as_number*/
>> +  0,                           /*tp_as_sequence*/
>> +  0,                           /*tp_as_mapping*/
>> +  0,                           /*tp_hash*/
>> +  0,                           /*tp_call*/
>> +  0,                           /*tp_str*/
>> +  0,                           /*tp_getattro*/
>> +  _no_setattr,                 /*tp_setattro*/
>> +  0,                           /*tp_as_buffer*/
>> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
>> +  "RE2 match objects",         /*tp_doc*/
>> +  0,                           /*tp_traverse*/
>> +  0,                           /*tp_clear*/
>> +  0,                           /*tp_richcompare*/
>> +  0,                           /*tp_weaklistoffset*/
>> +  0,                           /*tp_iter*/
>> +  0,                           /*tp_iternext*/
>> +  match_methods,               /*tp_methods*/
>> +  0,                           /*tp_members*/
>> +  0,                           /*tp_getset*/
>> +  0,                           /*tp_base*/
>> +  0,                           /*tp_dict*/
>> +  0,                           /*tp_descr_get*/
>> +  0,                           /*tp_descr_set*/
>> +  offsetof(MatchObject2, attr_dict),  /*tp_dictoffset*/
>> +  0,                           /*tp_init*/
>> +  0,                           /*tp_alloc*/
>> +  0,                           /*tp_new*/
>> +};
>> +
>> +
>> +static void
>> +regexp_dealloc(RegexpObject2* self)
>> +{
>> +  delete self->re2_obj;
>> +  Py_XDECREF(self->attr_dict);
>> +  PyObject_Del(self);
>> +}
>> +
>> +static PyObject*
>> +create_regexp(PyObject* pattern)
>> +{
>> +  RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2);
>> +  if (regexp == NULL) {
>> +    return NULL;
>> +  }
>> +  regexp->re2_obj = NULL;
>> +  regexp->attr_dict = NULL;
>> +
>> +  const char* raw_pattern = PyString_AS_STRING(pattern);
>> +  Py_ssize_t len_pattern = PyString_GET_SIZE(pattern);
>> +
>> +  RE2::Options options;
>> +  options.set_log_errors(false);
>> +
>> +  regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int) len_pattern), options);
>> +
>> +  if (regexp->re2_obj == NULL) {
>> +    PyErr_NoMemory();
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  if (!regexp->re2_obj->ok()) {
>> +    long code = (long)regexp->re2_obj->error_code();
>> +    const std::string& msg = regexp->re2_obj->error();
>> +    PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length());
>> +    if (value == NULL) {
>> +      Py_DECREF(regexp);
>> +      return NULL;
>> +    }
>> +    PyErr_SetObject(error_class, value);
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  PyObject* groupindex = PyDict_New();
>> +  if (groupindex == NULL) {
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  // Build up the attr_dict early so regexp can take ownership of our reference
>> +  // to groupindex.
>> +  regexp->attr_dict = Py_BuildValue("{sisNsO}",
>> +      "groups", regexp->re2_obj->NumberOfCapturingGroups(),
>> +      "groupindex", groupindex,
>> +      "pattern", pattern);
>> +  if (regexp->attr_dict == NULL) {
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  const std::map<std::string, int>& name_map = regexp->re2_obj->NamedCapturingGroups();
>> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
>> +    PyObject* index = PyInt_FromLong(it->second);
>> +    if (index == NULL) {
>> +      Py_DECREF(regexp);
>> +      return NULL;
>> +    }
>> +    int res = PyDict_SetItemString(groupindex, it->first.c_str(), index);
>> +    Py_DECREF(index);
>> +    if (res < 0) {
>> +      Py_DECREF(regexp);
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  return (PyObject*)regexp;
>> +}
>> +
>> +static PyObject*
>> +_do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match)
>> +{
>> +  PyObject* string;
>> +  const char* subject;
>> +  Py_ssize_t slen;
>> +  long pos = 0;
>> +  long endpos = LONG_MAX;
>> +
>> +  static const char* kwlist[] = {
>> +    "string",
>> +    "pos",
>> +    "endpos",
>> +    NULL};
>> +
>> +  // Using O! instead of s# here, because we want to stash the original
>> +  // PyObject* in the match object on a successful match.
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!|ll", (char**)kwlist,
>> +        &PyString_Type, &string,
>> +        &pos, &endpos)) {
>> +    return NULL;
>> +  }
>> +
>> +  subject = PyString_AS_STRING(string);
>> +  slen = PyString_GET_SIZE(string);
>> +  if (pos < 0) pos = 0;
>> +  if (pos > slen) pos = slen;
>> +  if (endpos < pos) endpos = pos;
>> +  if (endpos > slen) endpos = slen;
>> +
>> +  // Don't bother allocating these if we are just doing a test.
>> +  int n_groups = 0;
>> +  StringPiece* groups = NULL;
>> +  if (return_match) {
>> +    n_groups = self->re2_obj->NumberOfCapturingGroups() + 1;
>> +    groups = new(nothrow) StringPiece[n_groups];
>> +
>> +    if (groups == NULL) {
>> +      PyErr_NoMemory();
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  bool matched = self->re2_obj->Match(
>> +      StringPiece(subject, (int) slen),
>> +      (int) pos,
>> +      (int) endpos,
>> +      anchor,
>> +      groups,
>> +      n_groups);
>> +
>> +  if (!return_match) {
>> +    if (matched) {
>> +      Py_RETURN_TRUE;
>> +    }
>> +    Py_RETURN_FALSE;
>> +  }
>> +
>> +  if (!matched) {
>> +    delete[] groups;
>> +    Py_RETURN_NONE;
>> +  }
>> +
>> +  // create_match is going to Py_BuildValue the pos and endpos into
>> +  // PyObjects.  We could optimize the case where pos and/or endpos were
>> +  // explicitly passed in by forwarding the existing PyObjects.
>> +  // That requires much more intricate code, though.
>> +  return create_match((PyObject*)self, string, pos, endpos, groups);
>> +}
>> +
>> +static PyObject*
>> +regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::UNANCHORED, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::UNANCHORED, false);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, false);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false);
>> +}
>> +
>> +
>> +static void
>> +match_dealloc(MatchObject2* self)
>> +{
>> +  delete[] self->groups;
>> +  Py_XDECREF(self->attr_dict);
>> +  PyObject_Del(self);
>> +}
>> +
>> +static PyObject*
>> +create_match(PyObject* re, PyObject* string,
>> +    long pos, long endpos,
>> +    StringPiece* groups)
>> +{
>> +  MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
>> +  if (match == NULL) {
>> +    delete[] groups;
>> +    return NULL;
>> +  }
>> +  match->attr_dict = NULL;
>> +  match->groups = groups;
>> +  match->re = re;
>> +  match->string = string;
>> +
>> +  match->attr_dict = Py_BuildValue("{sOsOslsl}",
>> +      "re", re,
>> +      "string", string,
>> +      "pos", pos,
>> +      "endpos", endpos);
>> +  if (match->attr_dict == NULL) {
>> +    Py_DECREF(match);
>> +    return NULL;
>> +  }
>> +
>> +  return (PyObject*)match;
>> +}
>> +
>> +/**
>> + * Attempt to convert an untrusted group index (PyObject* group) into
>> + * a trusted one (*idx_p).  Return false on failure (exception).
>> + */
>> +static bool
>> +_group_idx(MatchObject2* self, PyObject* group, long* idx_p)
>> +{
>> +  if (group == NULL) {
>> +    return false;
>> +  }
>> +  PyErr_Clear(); // Is this necessary?
>> +  long idx = PyInt_AsLong(group);
>> +  if (idx == -1 && PyErr_Occurred() != NULL) {
>> +    return false;
>> +  }
>> +  // TODO: Consider caching NumberOfCapturingGroups.
>> +  if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) {
>> +    PyErr_SetString(PyExc_IndexError, "no such group");
>> +    return false;
>> +  }
>> +  *idx_p = idx;
>> +  return true;
>> +}
>> +
>> +/**
>> + * Extract the start and end indexes of a pre-checked group number.
>> + * Sets both to -1 if it did not participate in the match.
>> + */
>> +static bool
>> +_group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end)
>> +{
>> +  // "idx" is expected to be verified.
>> +  StringPiece& piece = self->groups[idx];
>> +  if (piece.data() == NULL) {
>> +    *o_start = -1;
>> +    *o_end = -1;
>> +    return false;
>> +  }
>> +  Py_ssize_t start = piece.data() - PyString_AS_STRING(self->string);
>> +  *o_start = start;
>> +  *o_end = start + piece.length();
>> +  return true;
>> +}
>> +
>> +/**
>> + * Return a pre-checked group number as a string, or default_obj
>> + * if it didn't participate in the match.
>> + */
>> +static PyObject*
>> +_group_get_i(MatchObject2* self, long idx, PyObject* default_obj)
>> +{
>> +  Py_ssize_t start;
>> +  Py_ssize_t end;
>> +  if (!_group_span(self, idx, &start, &end)) {
>> +    Py_INCREF(default_obj);
>> +    return default_obj;
>> +  }
>> +  return PySequence_GetSlice(self->string, start, end);
>> +}
>> +
>> +/**
>> + * Return n un-checked group number as a string.
>> + */
>> +static PyObject*
>> +_group_get_o(MatchObject2* self, PyObject* group)
>> +{
>> +  long idx;
>> +  if (!_group_idx(self, group, &idx)) {
>> +    return NULL;
>> +  }
>> +  return _group_get_i(self, idx, Py_None);
>> +}
>> +
>> +
>> +static PyObject*
>> +match_group(MatchObject2* self, PyObject* args)
>> +{
>> +  long idx = 0;
>> +  Py_ssize_t nargs = PyTuple_GET_SIZE(args);
>> +  switch (nargs) {
>> +    case 1:
>> +      if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) {
>> +        return NULL;
>> +      }
>> +      // Fall through.
>> +    case 0:
>> +      return _group_get_i(self, idx, Py_None);
>> +    default:
>> +      PyObject* ret = PyTuple_New(nargs);
>> +      if (ret == NULL) {
>> +        return NULL;
>> +      }
>> +
>> +      for (int i = 0; i < nargs; i++) {
>> +        PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i));
>> +        if (group == NULL) {
>> +          Py_DECREF(ret);
>> +          return NULL;
>> +        }
>> +        PyTuple_SET_ITEM(ret, i, group);
>> +      }
>> +      return ret;
>> +  }
>> +}
>> +
>> +static PyObject*
>> +match_groups(MatchObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  static const char* kwlist[] = {
>> +    "default",
>> +    NULL};
>> +
>> +  PyObject* default_obj = Py_None;
>> +
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
>> +        &default_obj)) {
>> +    return NULL;
>> +  }
>> +
>> +  int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups();
>> +
>> +  PyObject* ret = PyTuple_New(ngroups);
>> +  if (ret == NULL) {
>> +    return NULL;
>> +  }
>> +
>> +  for (int i = 1; i <= ngroups; i++) {
>> +    PyObject* group = _group_get_i(self, i, default_obj);
>> +    if (group == NULL) {
>> +      Py_DECREF(ret);
>> +      return NULL;
>> +    }
>> +    PyTuple_SET_ITEM(ret, i-1, group);
>> +  }
>> +
>> +  return ret;
>> +}
>> +
>> +static PyObject*
>> +match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  static const char* kwlist[] = {
>> +    "default",
>> +    NULL};
>> +
>> +  PyObject* default_obj = Py_None;
>> +
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
>> +        &default_obj)) {
>> +    return NULL;
>> +  }
>> +
>> +  PyObject* ret = PyDict_New();
>> +  if (ret == NULL) {
>> +    return NULL;
>> +  }
>> +
>> +  const std::map<std::string, int>& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups();
>> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
>> +    PyObject* group = _group_get_i(self, it->second, default_obj);
>> +    if (group == NULL) {
>> +      Py_DECREF(ret);
>> +      return NULL;
>> +    }
>> +    // TODO: Group names with embedded zeroes?
>> +    int res = PyDict_SetItemString(ret, it->first.data(), group);
>> +    Py_DECREF(group);
>> +    if (res < 0) {
>> +      Py_DECREF(ret);
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  return ret;
>> +}
>> +
>> +enum span_mode_t { START, END, SPAN };
>> +
>> +static PyObject*
>> +_do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode)
>> +{
>> +  long idx = 0;
>> +  PyObject* group = NULL;
>> +  if (!PyArg_UnpackTuple(args, name, 0, 1,
>> +        &group)) {
>> +    return NULL;
>> +  }
>> +  if (group != NULL) {
>> +    if (!_group_idx(self, group, &idx)) {
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  Py_ssize_t start = - 1;
>> +  Py_ssize_t end = - 1;
>> +
>> +  (void)_group_span(self, idx, &start, &end);
>> +  switch (mode) {
>> +    case START : return Py_BuildValue("n", start );
>> +    case END   : return Py_BuildValue("n", end   );
>> +    case SPAN:
>> +      return Py_BuildValue("nn", start, end);
>> +  }
>> +
>> +  // Make gcc happy.
>> +  return NULL;
>> +}
>> +
>> +static PyObject*
>> +match_start(MatchObject2* self, PyObject* args)
>> +{
>> +  return _do_span(self, args, "start", START);
>> +}
>> +
>> +static PyObject*
>> +match_end(MatchObject2* self, PyObject* args)
>> +{
>> +  return _do_span(self, args, "end", END);
>> +}
>> +
>> +static PyObject*
>> +match_span(MatchObject2* self, PyObject* args)
>> +{
>> +  return _do_span(self, args, "span", SPAN);
>> +}
>> +
>> +
>> +static PyObject*
>> +_compile(PyObject* self, PyObject* args, PyObject* kwds)
>> +{
>> +  static const char* kwlist[] = {
>> +    "pattern",
>> +    NULL};
>> +
>> +  PyObject* pattern;
>> +
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O", (char**)kwlist,
>> +        &pattern)) {
>> +    return NULL;
>> +  }
>> +
>> +  return create_regexp(pattern);
>> +}
>> +
>> +static PyObject*
>> +escape(PyObject* self, PyObject* args)
>> +{
>> +  char *str;
>> +  Py_ssize_t len;
>> +
>> +  if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) {
>> +    return NULL;
>> +  }
>> +
>> +  std::string esc(RE2::QuoteMeta(StringPiece(str, (int) len)));
>> +
>> +  return PyString_FromStringAndSize(esc.c_str(), esc.size());
>> +}
>> +
>> +static PyMethodDef methods[] = {
>> +  {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL},
>> +  {"escape", (PyCFunction)escape, METH_VARARGS,
>> +   "Escape all potentially meaningful regexp characters."},
>> +  {NULL}  /* Sentinel */
>> +};
>> +
>> +PyMODINIT_FUNC
>> +init_re2(void)
>> +{
>> +  if (PyType_Ready(&Regexp_Type2) < 0) {
>> +    return;
>> +  }
>> +
>> +  if (PyType_Ready(&Match_Type2) < 0) {
>> +    return;
>> +  }
>> +
>> +  PyObject* sre_mod = PyImport_ImportModuleNoBlock("sre_constants");
>> +  if (sre_mod == NULL) {
>> +    return;
>> +  }
>> +  /* static global */ error_class = PyObject_GetAttrString(sre_mod, "error");
>> +  if (error_class == NULL) {
>> +    return;
>> +  }
>> +
>> +  PyObject* mod = Py_InitModule("_re2", methods);
>> +
>> +  Py_INCREF(error_class);
>> +  PyModule_AddObject(mod, "error", error_class);
>> +}
>> diff --git a/mercurial/re2.py b/mercurial/re2.py
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/re2.py
>> @@ -0,0 +1,63 @@
>> +#!/usr/bin/env python
>> +
>> +# Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> +#
>> +# Redistribution and use in source and binary forms, with or without
>> +# modification, are permitted provided that the following conditions
>> +# are met:
>> +# * Redistributions of source code must retain the above copyright
>> +#   notice, this list of conditions and the following disclaimer.
>> +# * Redistributions in binary form must reproduce the above copyright
>> +#   notice, this list of conditions and the following disclaimer in the
>> +#   documentation and/or other materials provided with the distribution.
>> +# * Neither the name of Facebook nor the names of its contributors
>> +#   may be used to endorse or promote products derived from this software
>> +#   without specific prior written permission.
>> +#
>> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> +
>> +import _re2
>> +
>> +__all__ = [
>> +    "error",
>> +    "escape",
>> +    "compile",
>> +    "search",
>> +    "match",
>> +    "fullmatch",
>> +    ]
>> +
>> +# Module-private compilation function, for future caching, other enhancements
>> +_compile = _re2._compile
>> +
>> +error = _re2.error
>> +escape = _re2.escape
>> +
>> +def compile(pattern):
>> +    "Compile a regular expression pattern, returning a pattern object."
>> +    return _compile(pattern)
>> +
>> +def search(pattern, string):
>> +    """Scan through string looking for a match to the pattern, returning
>> +    a match object, or None if no match was found."""
>> +    return _compile(pattern).search(string)
>> +
>> +def match(pattern, string):
>> +    """Try to apply the pattern at the start of the string, returning
>> +    a match object, or None if no match was found."""
>> +    return _compile(pattern).match(string)
>> +
>> +def fullmatch(pattern, string):
>> +    """Try to apply the pattern to the entire string, returning
>> +    a match object, or None if no match was found."""
>> +    return _compile(pattern).fullmatch(string)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel at selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel



More information about the Mercurial-devel mailing list