[PATCH 1 of 3 RFC] mercurial: add python re2 bindings

Augie Fackler raf at durin42.com
Tue Sep 9 11:28:31 CDT 2014


On Tue, Sep 02, 2014 at 02:18:08PM -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0 at fb.com>
> # Date 1409591324 25200
> #      Mon Sep 01 10:08:44 2014 -0700
> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
> # Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
> mercurial: add python re2 bindings

It looks like this is bringing in a package that would otherwise live
on pypi, is that right?

Is it possible that we could just recommend that packagers make sure
those bindings are packaged too, rather than embedding a bonus copy in
mercurial?

>
> These bindings will enable packagers to build Mercurial with re2 support.
>
> The bindings are licensed as 3-clause BSD.
>
> I've moved re2.py to mercurial/ to allow 'from mercurial import re2' to work.
> This is my first time doing this, so it is very likely I got some things wrong.
>
> diff --git a/mercurial/pyre2/LICENSE b/mercurial/pyre2/LICENSE
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/pyre2/LICENSE
> @@ -0,0 +1,25 @@
> +Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
> +
> +Redistribution and use in source and binary forms, with or without
> +modification, are permitted provided that the following conditions
> +are met:
> +* Redistributions of source code must retain the above copyright
> +  notice, this list of conditions and the following disclaimer.
> +* Redistributions in binary form must reproduce the above copyright
> +  notice, this list of conditions and the following disclaimer in the
> +  documentation and/or other materials provided with the distribution.
> +* Neither the name of Facebook nor the names of its contributors
> +  may be used to endorse or promote products derived from this software
> +  without specific prior written permission.
> +
> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> diff --git a/mercurial/pyre2/README.rst b/mercurial/pyre2/README.rst
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/pyre2/README.rst
> @@ -0,0 +1,71 @@
> +=====
> +pyre2
> +=====
> +
> +.. contents::
> +
> +Summary
> +=======
> +
> +pyre2 is a Python extension that wraps
> +`Google's RE2 regular expression library
> +<http://code.google.com/p/re2/>`_.
> +It implements many of the features of Python's built-in
> +``re`` module with compatible interfaces.
> +
> +
> +New Features
> +============
> +
> +* ``Regexp`` objects have a ``fullmatch`` method that works like ``match``,
> +  but anchors the match at both the start and the end.
> +* ``Regexp`` objects have
> +  ``test_search``, ``test_match``, and ``test_fullmatch``
> +  methods that work like ``search``, ``match``, and ``fullmatch``,
> +  but only return ``True`` or ``False`` to indicate
> +  whether the match was successful.
> +  These methods should be faster than the full versions,
> +  especially for patterns with capturing groups.
> +
> +
> +Missing Features
> +================
> +
> +* No substitution methods.
> +* No flags.
> +* No ``split``, ``findall``, or ``finditer``.
> +* No top-level convenience functions like ``search`` and ``match``.
> +  (Just use compile.)
> +* No compile cache.
> +  (If you care enough about performance to use RE2,
> +  you probably care enough to cache your own patterns.)
> +* No ``lastindex`` or ``lastgroup`` on ``Match`` objects.
> +
> +
> +Current Status
> +==============
> +
> +pyre2 has only received basic testing,
> +and I am by no means a Python extension expert,
> +so it is quite possible that it contains bugs.
> +I'd guess the most likely are reference leaks in error cases.
> +
> +RE2 doesn't build with fPIC, so I had to bulid it with
> +
> +::
> +
> +  make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.'
> +
> +I also had to add it to my compiler search path when building the module
> +with a command like
> +
> +::
> +
> +  env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build
> +
> +
> +Contact
> +=======
> +
> +You can file bug reports on GitHub, or email the author:
> +David Reiss <dreiss at facebook.com>.
> diff --git a/mercurial/pyre2/_re2.cc b/mercurial/pyre2/_re2.cc
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/pyre2/_re2.cc
> @@ -0,0 +1,753 @@
> +/*
> + * Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in the
> + *   documentation and/or other materials provided with the distribution.
> + * * Neither the name of Facebook nor the names of its contributors
> + *   may be used to endorse or promote products derived from this software
> + *   without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +
> +#include <cstddef>
> +
> +#include <string>
> +#include <new>
> +using std::nothrow;
> +
> +#include <re2/re2.h>
> +using re2::RE2;
> +using re2::StringPiece;
> +
> +
> +typedef struct _RegexpObject2 {
> +  PyObject_HEAD
> +  // __dict__.  Simpler than implementing getattr and possibly faster.
> +  PyObject* attr_dict;
> +  RE2* re2_obj;
> +} RegexpObject2;
> +
> +typedef struct _MatchObject2 {
> +  PyObject_HEAD
> +  // __dict__.  Simpler than implementing getattr and possibly faster.
> +  PyObject* attr_dict;
> +  // Cache of __dict__["re"] and __dict__["string", which are used for group()
> +  // calls. These fields do *not* own their own references.  They piggyback on
> +  // the references in attr_dict.
> +  PyObject* re;
> +  PyObject* string;
> +  // There are several possible approaches to storing the matched groups:
> +  // 1. Fully materialize the groups tuple at match time.
> +  // 2. Cache allocate PyString objects when groups are requested.
> +  // 3. Always allocate new PyStrings on demand.
> +  // I've chosen to go with #3.  It's the simplest, and I'm pretty sure it's
> +  // optimal in all cases where no group is fetched more than once.
> +  StringPiece* groups;
> +} MatchObject2;
> +
> +
> +// Imported from sre_constants.
> +static PyObject* error_class;
> +
> +
> +// Forward declarations of methods, creators, and destructors.
> +static void regexp_dealloc(RegexpObject2* self);
> +static PyObject* create_regexp(PyObject* pattern);
> +static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static void match_dealloc(MatchObject2* self);
> +static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups);
> +static PyObject* match_group(MatchObject2* self, PyObject* args);
> +static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* match_start(MatchObject2* self, PyObject* args);
> +static PyObject* match_end(MatchObject2* self, PyObject* args);
> +static PyObject* match_span(MatchObject2* self, PyObject* args);
> +
> +
> +static PyMethodDef regexp_methods[] = {
> +  {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS,
> +    "search(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Scan through string looking for a match, and return a corresponding\n"
> +    "    MatchObject instance. Return None if no position in the string matches."
> +  },
> +  {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS,
> +    "match(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Matches zero or more characters at the beginning of the string"
> +  },
> +  {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS,
> +    "fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Matches the entire string"
> +  },
> +  {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS,
> +    "test_search(string[, pos[, endpos]]) --> bool.\n"
> +    "    Like 'search', but only returns whether a match was found."
> +  },
> +  {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS,
> +    "test_match(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Like 'match', but only returns whether a match was found."
> +  },
> +  {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS,
> +    "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Like 'fullmatch', but only returns whether a match was found."
> +  },
> +  {NULL}  /* Sentinel */
> +};
> +
> +static PyMethodDef match_methods[] = {
> +  {"group", (PyCFunction)match_group, METH_VARARGS,
> +    NULL
> +  },
> +  {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS,
> +    NULL
> +  },
> +  {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS,
> +    NULL
> +  },
> +  {"start", (PyCFunction)match_start, METH_VARARGS,
> +    NULL
> +  },
> +  {"end", (PyCFunction)match_end, METH_VARARGS,
> +    NULL
> +  },
> +  {"span", (PyCFunction)match_span, METH_VARARGS,
> +    NULL
> +  },
> +  {NULL}  /* Sentinel */
> +};
> +
> +
> +// Simple method to block setattr.
> +static int
> +_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
> +  (void)name;
> +  (void)v;
> +	PyErr_Format(PyExc_AttributeError,
> +      "'%s' object attributes are read-only",
> +      obj->ob_type->tp_name);
> +  return -1;
> +}
> +
> +
> +static PyTypeObject Regexp_Type2 = {
> +  PyObject_HEAD_INIT(NULL)
> +  0,                           /*ob_size*/
> +  "_re2.RE2_Regexp",           /*tp_name*/
> +  sizeof(RegexpObject2),       /*tp_basicsize*/
> +  0,                           /*tp_itemsize*/
> +  (destructor)regexp_dealloc,  /*tp_dealloc*/
> +  0,                           /*tp_print*/
> +  0,                           /*tp_getattr*/
> +  0,                           /*tp_setattr*/
> +  0,                           /*tp_compare*/
> +  0,                           /*tp_repr*/
> +  0,                           /*tp_as_number*/
> +  0,                           /*tp_as_sequence*/
> +  0,                           /*tp_as_mapping*/
> +  0,                           /*tp_hash*/
> +  0,                           /*tp_call*/
> +  0,                           /*tp_str*/
> +  0,                           /*tp_getattro*/
> +  _no_setattr,                 /*tp_setattro*/
> +  0,                           /*tp_as_buffer*/
> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
> +  "RE2 regexp objects",        /*tp_doc*/
> +  0,                           /*tp_traverse*/
> +  0,                           /*tp_clear*/
> +  0,                           /*tp_richcompare*/
> +  0,                           /*tp_weaklistoffset*/
> +  0,                           /*tp_iter*/
> +  0,                           /*tp_iternext*/
> +  regexp_methods,              /*tp_methods*/
> +  0,                           /*tp_members*/
> +  0,                           /*tp_getset*/
> +  0,                           /*tp_base*/
> +  0,                           /*tp_dict*/
> +  0,                           /*tp_descr_get*/
> +  0,                           /*tp_descr_set*/
> +  offsetof(RegexpObject2, attr_dict),  /*tp_dictoffset*/
> +  0,                           /*tp_init*/
> +  0,                           /*tp_alloc*/
> +  0,                           /*tp_new*/
> +};
> +
> +static PyTypeObject Match_Type2 = {
> +  PyObject_HEAD_INIT(NULL)
> +  0,                           /*ob_size*/
> +  "_re2.RE2_Match",            /*tp_name*/
> +  sizeof(MatchObject2),        /*tp_basicsize*/
> +  0,                           /*tp_itemsize*/
> +  (destructor)match_dealloc,   /*tp_dealloc*/
> +  0,                           /*tp_print*/
> +  0,                           /*tp_getattr*/
> +  0,                           /*tp_setattr*/
> +  0,                           /*tp_compare*/
> +  0,                           /*tp_repr*/
> +  0,                           /*tp_as_number*/
> +  0,                           /*tp_as_sequence*/
> +  0,                           /*tp_as_mapping*/
> +  0,                           /*tp_hash*/
> +  0,                           /*tp_call*/
> +  0,                           /*tp_str*/
> +  0,                           /*tp_getattro*/
> +  _no_setattr,                 /*tp_setattro*/
> +  0,                           /*tp_as_buffer*/
> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
> +  "RE2 match objects",         /*tp_doc*/
> +  0,                           /*tp_traverse*/
> +  0,                           /*tp_clear*/
> +  0,                           /*tp_richcompare*/
> +  0,                           /*tp_weaklistoffset*/
> +  0,                           /*tp_iter*/
> +  0,                           /*tp_iternext*/
> +  match_methods,               /*tp_methods*/
> +  0,                           /*tp_members*/
> +  0,                           /*tp_getset*/
> +  0,                           /*tp_base*/
> +  0,                           /*tp_dict*/
> +  0,                           /*tp_descr_get*/
> +  0,                           /*tp_descr_set*/
> +  offsetof(MatchObject2, attr_dict),  /*tp_dictoffset*/
> +  0,                           /*tp_init*/
> +  0,                           /*tp_alloc*/
> +  0,                           /*tp_new*/
> +};
> +
> +
> +static void
> +regexp_dealloc(RegexpObject2* self)
> +{
> +  delete self->re2_obj;
> +  Py_XDECREF(self->attr_dict);
> +  PyObject_Del(self);
> +}
> +
> +static PyObject*
> +create_regexp(PyObject* pattern)
> +{
> +  RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2);
> +  if (regexp == NULL) {
> +    return NULL;
> +  }
> +  regexp->re2_obj = NULL;
> +  regexp->attr_dict = NULL;
> +
> +  const char* raw_pattern = PyString_AS_STRING(pattern);
> +  Py_ssize_t len_pattern = PyString_GET_SIZE(pattern);
> +
> +  RE2::Options options;
> +  options.set_log_errors(false);
> +
> +  regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int) len_pattern), options);
> +
> +  if (regexp->re2_obj == NULL) {
> +    PyErr_NoMemory();
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  if (!regexp->re2_obj->ok()) {
> +    long code = (long)regexp->re2_obj->error_code();
> +    const std::string& msg = regexp->re2_obj->error();
> +    PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length());
> +    if (value == NULL) {
> +      Py_DECREF(regexp);
> +      return NULL;
> +    }
> +    PyErr_SetObject(error_class, value);
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  PyObject* groupindex = PyDict_New();
> +  if (groupindex == NULL) {
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  // Build up the attr_dict early so regexp can take ownership of our reference
> +  // to groupindex.
> +  regexp->attr_dict = Py_BuildValue("{sisNsO}",
> +      "groups", regexp->re2_obj->NumberOfCapturingGroups(),
> +      "groupindex", groupindex,
> +      "pattern", pattern);
> +  if (regexp->attr_dict == NULL) {
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  const std::map<std::string, int>& name_map = regexp->re2_obj->NamedCapturingGroups();
> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
> +    PyObject* index = PyInt_FromLong(it->second);
> +    if (index == NULL) {
> +      Py_DECREF(regexp);
> +      return NULL;
> +    }
> +    int res = PyDict_SetItemString(groupindex, it->first.c_str(), index);
> +    Py_DECREF(index);
> +    if (res < 0) {
> +      Py_DECREF(regexp);
> +      return NULL;
> +    }
> +  }
> +
> +  return (PyObject*)regexp;
> +}
> +
> +static PyObject*
> +_do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match)
> +{
> +  PyObject* string;
> +  const char* subject;
> +  Py_ssize_t slen;
> +  long pos = 0;
> +  long endpos = LONG_MAX;
> +
> +  static const char* kwlist[] = {
> +    "string",
> +    "pos",
> +    "endpos",
> +    NULL};
> +
> +  // Using O! instead of s# here, because we want to stash the original
> +  // PyObject* in the match object on a successful match.
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!|ll", (char**)kwlist,
> +        &PyString_Type, &string,
> +        &pos, &endpos)) {
> +    return NULL;
> +  }
> +
> +  subject = PyString_AS_STRING(string);
> +  slen = PyString_GET_SIZE(string);
> +  if (pos < 0) pos = 0;
> +  if (pos > slen) pos = slen;
> +  if (endpos < pos) endpos = pos;
> +  if (endpos > slen) endpos = slen;
> +
> +  // Don't bother allocating these if we are just doing a test.
> +  int n_groups = 0;
> +  StringPiece* groups = NULL;
> +  if (return_match) {
> +    n_groups = self->re2_obj->NumberOfCapturingGroups() + 1;
> +    groups = new(nothrow) StringPiece[n_groups];
> +
> +    if (groups == NULL) {
> +      PyErr_NoMemory();
> +      return NULL;
> +    }
> +  }
> +
> +  bool matched = self->re2_obj->Match(
> +      StringPiece(subject, (int) slen),
> +      (int) pos,
> +      (int) endpos,
> +      anchor,
> +      groups,
> +      n_groups);
> +
> +  if (!return_match) {
> +    if (matched) {
> +      Py_RETURN_TRUE;
> +    }
> +    Py_RETURN_FALSE;
> +  }
> +
> +  if (!matched) {
> +    delete[] groups;
> +    Py_RETURN_NONE;
> +  }
> +
> +  // create_match is going to Py_BuildValue the pos and endpos into
> +  // PyObjects.  We could optimize the case where pos and/or endpos were
> +  // explicitly passed in by forwarding the existing PyObjects.
> +  // That requires much more intricate code, though.
> +  return create_match((PyObject*)self, string, pos, endpos, groups);
> +}
> +
> +static PyObject*
> +regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::UNANCHORED, true);
> +}
> +
> +static PyObject*
> +regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, true);
> +}
> +
> +static PyObject*
> +regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true);
> +}
> +
> +static PyObject*
> +regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::UNANCHORED, false);
> +}
> +
> +static PyObject*
> +regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, false);
> +}
> +
> +static PyObject*
> +regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false);
> +}
> +
> +
> +static void
> +match_dealloc(MatchObject2* self)
> +{
> +  delete[] self->groups;
> +  Py_XDECREF(self->attr_dict);
> +  PyObject_Del(self);
> +}
> +
> +static PyObject*
> +create_match(PyObject* re, PyObject* string,
> +    long pos, long endpos,
> +    StringPiece* groups)
> +{
> +  MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
> +  if (match == NULL) {
> +    delete[] groups;
> +    return NULL;
> +  }
> +  match->attr_dict = NULL;
> +  match->groups = groups;
> +  match->re = re;
> +  match->string = string;
> +
> +  match->attr_dict = Py_BuildValue("{sOsOslsl}",
> +      "re", re,
> +      "string", string,
> +      "pos", pos,
> +      "endpos", endpos);
> +  if (match->attr_dict == NULL) {
> +    Py_DECREF(match);
> +    return NULL;
> +  }
> +
> +  return (PyObject*)match;
> +}
> +
> +/**
> + * Attempt to convert an untrusted group index (PyObject* group) into
> + * a trusted one (*idx_p).  Return false on failure (exception).
> + */
> +static bool
> +_group_idx(MatchObject2* self, PyObject* group, long* idx_p)
> +{
> +  if (group == NULL) {
> +    return false;
> +  }
> +  PyErr_Clear(); // Is this necessary?
> +  long idx = PyInt_AsLong(group);
> +  if (idx == -1 && PyErr_Occurred() != NULL) {
> +    return false;
> +  }
> +  // TODO: Consider caching NumberOfCapturingGroups.
> +  if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) {
> +    PyErr_SetString(PyExc_IndexError, "no such group");
> +    return false;
> +  }
> +  *idx_p = idx;
> +  return true;
> +}
> +
> +/**
> + * Extract the start and end indexes of a pre-checked group number.
> + * Sets both to -1 if it did not participate in the match.
> + */
> +static bool
> +_group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end)
> +{
> +  // "idx" is expected to be verified.
> +  StringPiece& piece = self->groups[idx];
> +  if (piece.data() == NULL) {
> +    *o_start = -1;
> +    *o_end = -1;
> +    return false;
> +  }
> +  Py_ssize_t start = piece.data() - PyString_AS_STRING(self->string);
> +  *o_start = start;
> +  *o_end = start + piece.length();
> +  return true;
> +}
> +
> +/**
> + * Return a pre-checked group number as a string, or default_obj
> + * if it didn't participate in the match.
> + */
> +static PyObject*
> +_group_get_i(MatchObject2* self, long idx, PyObject* default_obj)
> +{
> +  Py_ssize_t start;
> +  Py_ssize_t end;
> +  if (!_group_span(self, idx, &start, &end)) {
> +    Py_INCREF(default_obj);
> +    return default_obj;
> +  }
> +  return PySequence_GetSlice(self->string, start, end);
> +}
> +
> +/**
> + * Return n un-checked group number as a string.
> + */
> +static PyObject*
> +_group_get_o(MatchObject2* self, PyObject* group)
> +{
> +  long idx;
> +  if (!_group_idx(self, group, &idx)) {
> +    return NULL;
> +  }
> +  return _group_get_i(self, idx, Py_None);
> +}
> +
> +
> +static PyObject*
> +match_group(MatchObject2* self, PyObject* args)
> +{
> +  long idx = 0;
> +  Py_ssize_t nargs = PyTuple_GET_SIZE(args);
> +  switch (nargs) {
> +    case 1:
> +      if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) {
> +        return NULL;
> +      }
> +      // Fall through.
> +    case 0:
> +      return _group_get_i(self, idx, Py_None);
> +    default:
> +      PyObject* ret = PyTuple_New(nargs);
> +      if (ret == NULL) {
> +        return NULL;
> +      }
> +
> +      for (int i = 0; i < nargs; i++) {
> +        PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i));
> +        if (group == NULL) {
> +          Py_DECREF(ret);
> +          return NULL;
> +        }
> +        PyTuple_SET_ITEM(ret, i, group);
> +      }
> +      return ret;
> +  }
> +}
> +
> +static PyObject*
> +match_groups(MatchObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  static const char* kwlist[] = {
> +    "default",
> +    NULL};
> +
> +  PyObject* default_obj = Py_None;
> +
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
> +        &default_obj)) {
> +    return NULL;
> +  }
> +
> +  int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups();
> +
> +  PyObject* ret = PyTuple_New(ngroups);
> +  if (ret == NULL) {
> +    return NULL;
> +  }
> +
> +  for (int i = 1; i <= ngroups; i++) {
> +    PyObject* group = _group_get_i(self, i, default_obj);
> +    if (group == NULL) {
> +      Py_DECREF(ret);
> +      return NULL;
> +    }
> +    PyTuple_SET_ITEM(ret, i-1, group);
> +  }
> +
> +  return ret;
> +}
> +
> +static PyObject*
> +match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  static const char* kwlist[] = {
> +    "default",
> +    NULL};
> +
> +  PyObject* default_obj = Py_None;
> +
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
> +        &default_obj)) {
> +    return NULL;
> +  }
> +
> +  PyObject* ret = PyDict_New();
> +  if (ret == NULL) {
> +    return NULL;
> +  }
> +
> +  const std::map<std::string, int>& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups();
> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
> +    PyObject* group = _group_get_i(self, it->second, default_obj);
> +    if (group == NULL) {
> +      Py_DECREF(ret);
> +      return NULL;
> +    }
> +    // TODO: Group names with embedded zeroes?
> +    int res = PyDict_SetItemString(ret, it->first.data(), group);
> +    Py_DECREF(group);
> +    if (res < 0) {
> +      Py_DECREF(ret);
> +      return NULL;
> +    }
> +  }
> +
> +  return ret;
> +}
> +
> +enum span_mode_t { START, END, SPAN };
> +
> +static PyObject*
> +_do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode)
> +{
> +  long idx = 0;
> +  PyObject* group = NULL;
> +  if (!PyArg_UnpackTuple(args, name, 0, 1,
> +        &group)) {
> +    return NULL;
> +  }
> +  if (group != NULL) {
> +    if (!_group_idx(self, group, &idx)) {
> +      return NULL;
> +    }
> +  }
> +
> +  Py_ssize_t start = - 1;
> +  Py_ssize_t end = - 1;
> +
> +  (void)_group_span(self, idx, &start, &end);
> +  switch (mode) {
> +    case START : return Py_BuildValue("n", start );
> +    case END   : return Py_BuildValue("n", end   );
> +    case SPAN:
> +      return Py_BuildValue("nn", start, end);
> +  }
> +
> +  // Make gcc happy.
> +  return NULL;
> +}
> +
> +static PyObject*
> +match_start(MatchObject2* self, PyObject* args)
> +{
> +  return _do_span(self, args, "start", START);
> +}
> +
> +static PyObject*
> +match_end(MatchObject2* self, PyObject* args)
> +{
> +  return _do_span(self, args, "end", END);
> +}
> +
> +static PyObject*
> +match_span(MatchObject2* self, PyObject* args)
> +{
> +  return _do_span(self, args, "span", SPAN);
> +}
> +
> +
> +static PyObject*
> +_compile(PyObject* self, PyObject* args, PyObject* kwds)
> +{
> +  static const char* kwlist[] = {
> +    "pattern",
> +    NULL};
> +
> +  PyObject* pattern;
> +
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O", (char**)kwlist,
> +        &pattern)) {
> +    return NULL;
> +  }
> +
> +  return create_regexp(pattern);
> +}
> +
> +static PyObject*
> +escape(PyObject* self, PyObject* args)
> +{
> +  char *str;
> +  Py_ssize_t len;
> +
> +  if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) {
> +    return NULL;
> +  }
> +
> +  std::string esc(RE2::QuoteMeta(StringPiece(str, (int) len)));
> +
> +  return PyString_FromStringAndSize(esc.c_str(), esc.size());
> +}
> +
> +static PyMethodDef methods[] = {
> +  {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL},
> +  {"escape", (PyCFunction)escape, METH_VARARGS,
> +   "Escape all potentially meaningful regexp characters."},
> +  {NULL}  /* Sentinel */
> +};
> +
> +PyMODINIT_FUNC
> +init_re2(void)
> +{
> +  if (PyType_Ready(&Regexp_Type2) < 0) {
> +    return;
> +  }
> +
> +  if (PyType_Ready(&Match_Type2) < 0) {
> +    return;
> +  }
> +
> +  PyObject* sre_mod = PyImport_ImportModuleNoBlock("sre_constants");
> +  if (sre_mod == NULL) {
> +    return;
> +  }
> +  /* static global */ error_class = PyObject_GetAttrString(sre_mod, "error");
> +  if (error_class == NULL) {
> +    return;
> +  }
> +
> +  PyObject* mod = Py_InitModule("_re2", methods);
> +
> +  Py_INCREF(error_class);
> +  PyModule_AddObject(mod, "error", error_class);
> +}
> diff --git a/mercurial/re2.py b/mercurial/re2.py
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/re2.py
> @@ -0,0 +1,63 @@
> +#!/usr/bin/env python
> +
> +# Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in the
> +#   documentation and/or other materials provided with the distribution.
> +# * Neither the name of Facebook nor the names of its contributors
> +#   may be used to endorse or promote products derived from this software
> +#   without specific prior written permission.
> +#
> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +import _re2
> +
> +__all__ = [
> +    "error",
> +    "escape",
> +    "compile",
> +    "search",
> +    "match",
> +    "fullmatch",
> +    ]
> +
> +# Module-private compilation function, for future caching, other enhancements
> +_compile = _re2._compile
> +
> +error = _re2.error
> +escape = _re2.escape
> +
> +def compile(pattern):
> +    "Compile a regular expression pattern, returning a pattern object."
> +    return _compile(pattern)
> +
> +def search(pattern, string):
> +    """Scan through string looking for a match to the pattern, returning
> +    a match object, or None if no match was found."""
> +    return _compile(pattern).search(string)
> +
> +def match(pattern, string):
> +    """Try to apply the pattern at the start of the string, returning
> +    a match object, or None if no match was found."""
> +    return _compile(pattern).match(string)
> +
> +def fullmatch(pattern, string):
> +    """Try to apply the pattern to the entire string, returning
> +    a match object, or None if no match was found."""
> +    return _compile(pattern).fullmatch(string)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list