[PATCH 1 of 3 RFC] mercurial: add python re2 bindings

Siddharth Agarwal sid0 at fb.com
Tue Sep 2 16:18:08 CDT 2014


# HG changeset patch
# User Siddharth Agarwal <sid0 at fb.com>
# Date 1409591324 25200
#      Mon Sep 01 10:08:44 2014 -0700
# Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
# Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
mercurial: add python re2 bindings

These bindings will enable packagers to build Mercurial with re2 support.

The bindings are licensed as 3-clause BSD.

I've moved re2.py to mercurial/ to allow 'from mercurial import re2' to work.
This is my first time doing this, so it is very likely I got some things wrong.

diff --git a/mercurial/pyre2/LICENSE b/mercurial/pyre2/LICENSE
new file mode 100644
--- /dev/null
+++ b/mercurial/pyre2/LICENSE
@@ -0,0 +1,25 @@
+Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+* Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+  notice, this list of conditions and the following disclaimer in the
+  documentation and/or other materials provided with the distribution.
+* Neither the name of Facebook nor the names of its contributors
+  may be used to endorse or promote products derived from this software
+  without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/mercurial/pyre2/README.rst b/mercurial/pyre2/README.rst
new file mode 100644
--- /dev/null
+++ b/mercurial/pyre2/README.rst
@@ -0,0 +1,71 @@
+=====
+pyre2
+=====
+
+.. contents::
+
+Summary
+=======
+
+pyre2 is a Python extension that wraps
+`Google's RE2 regular expression library
+<http://code.google.com/p/re2/>`_.
+It implements many of the features of Python's built-in
+``re`` module with compatible interfaces.
+
+
+New Features
+============
+
+* ``Regexp`` objects have a ``fullmatch`` method that works like ``match``,
+  but anchors the match at both the start and the end.
+* ``Regexp`` objects have
+  ``test_search``, ``test_match``, and ``test_fullmatch``
+  methods that work like ``search``, ``match``, and ``fullmatch``,
+  but only return ``True`` or ``False`` to indicate
+  whether the match was successful.
+  These methods should be faster than the full versions,
+  especially for patterns with capturing groups.
+
+
+Missing Features
+================
+
+* No substitution methods.
+* No flags.
+* No ``split``, ``findall``, or ``finditer``.
+* No top-level convenience functions like ``search`` and ``match``.
+  (Just use compile.)
+* No compile cache.
+  (If you care enough about performance to use RE2,
+  you probably care enough to cache your own patterns.)
+* No ``lastindex`` or ``lastgroup`` on ``Match`` objects.
+
+
+Current Status
+==============
+
+pyre2 has only received basic testing,
+and I am by no means a Python extension expert,
+so it is quite possible that it contains bugs.
+I'd guess the most likely are reference leaks in error cases.
+
+RE2 doesn't build with fPIC, so I had to bulid it with
+
+::
+
+  make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.'
+
+I also had to add it to my compiler search path when building the module
+with a command like
+
+::
+
+  env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build
+
+
+Contact
+=======
+
+You can file bug reports on GitHub, or email the author:
+David Reiss <dreiss at facebook.com>.
diff --git a/mercurial/pyre2/_re2.cc b/mercurial/pyre2/_re2.cc
new file mode 100644
--- /dev/null
+++ b/mercurial/pyre2/_re2.cc
@@ -0,0 +1,753 @@
+/*
+ * Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Facebook nor the names of its contributors
+ *   may be used to endorse or promote products derived from this software
+ *   without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+
+#include <cstddef>
+
+#include <string>
+#include <new>
+using std::nothrow;
+
+#include <re2/re2.h>
+using re2::RE2;
+using re2::StringPiece;
+
+
+typedef struct _RegexpObject2 {
+  PyObject_HEAD
+  // __dict__.  Simpler than implementing getattr and possibly faster.
+  PyObject* attr_dict;
+  RE2* re2_obj;
+} RegexpObject2;
+
+typedef struct _MatchObject2 {
+  PyObject_HEAD
+  // __dict__.  Simpler than implementing getattr and possibly faster.
+  PyObject* attr_dict;
+  // Cache of __dict__["re"] and __dict__["string", which are used for group()
+  // calls. These fields do *not* own their own references.  They piggyback on
+  // the references in attr_dict.
+  PyObject* re;
+  PyObject* string;
+  // There are several possible approaches to storing the matched groups:
+  // 1. Fully materialize the groups tuple at match time.
+  // 2. Cache allocate PyString objects when groups are requested.
+  // 3. Always allocate new PyStrings on demand.
+  // I've chosen to go with #3.  It's the simplest, and I'm pretty sure it's
+  // optimal in all cases where no group is fetched more than once.
+  StringPiece* groups;
+} MatchObject2;
+
+
+// Imported from sre_constants.
+static PyObject* error_class;
+
+
+// Forward declarations of methods, creators, and destructors.
+static void regexp_dealloc(RegexpObject2* self);
+static PyObject* create_regexp(PyObject* pattern);
+static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static void match_dealloc(MatchObject2* self);
+static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups);
+static PyObject* match_group(MatchObject2* self, PyObject* args);
+static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* match_start(MatchObject2* self, PyObject* args);
+static PyObject* match_end(MatchObject2* self, PyObject* args);
+static PyObject* match_span(MatchObject2* self, PyObject* args);
+
+
+static PyMethodDef regexp_methods[] = {
+  {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS,
+    "search(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Scan through string looking for a match, and return a corresponding\n"
+    "    MatchObject instance. Return None if no position in the string matches."
+  },
+  {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS,
+    "match(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Matches zero or more characters at the beginning of the string"
+  },
+  {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS,
+    "fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Matches the entire string"
+  },
+  {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS,
+    "test_search(string[, pos[, endpos]]) --> bool.\n"
+    "    Like 'search', but only returns whether a match was found."
+  },
+  {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS,
+    "test_match(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Like 'match', but only returns whether a match was found."
+  },
+  {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS,
+    "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Like 'fullmatch', but only returns whether a match was found."
+  },
+  {NULL}  /* Sentinel */
+};
+
+static PyMethodDef match_methods[] = {
+  {"group", (PyCFunction)match_group, METH_VARARGS,
+    NULL
+  },
+  {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS,
+    NULL
+  },
+  {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS,
+    NULL
+  },
+  {"start", (PyCFunction)match_start, METH_VARARGS,
+    NULL
+  },
+  {"end", (PyCFunction)match_end, METH_VARARGS,
+    NULL
+  },
+  {"span", (PyCFunction)match_span, METH_VARARGS,
+    NULL
+  },
+  {NULL}  /* Sentinel */
+};
+
+
+// Simple method to block setattr.
+static int
+_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
+  (void)name;
+  (void)v;
+	PyErr_Format(PyExc_AttributeError,
+      "'%s' object attributes are read-only",
+      obj->ob_type->tp_name);
+  return -1;
+}
+
+
+static PyTypeObject Regexp_Type2 = {
+  PyObject_HEAD_INIT(NULL)
+  0,                           /*ob_size*/
+  "_re2.RE2_Regexp",           /*tp_name*/
+  sizeof(RegexpObject2),       /*tp_basicsize*/
+  0,                           /*tp_itemsize*/
+  (destructor)regexp_dealloc,  /*tp_dealloc*/
+  0,                           /*tp_print*/
+  0,                           /*tp_getattr*/
+  0,                           /*tp_setattr*/
+  0,                           /*tp_compare*/
+  0,                           /*tp_repr*/
+  0,                           /*tp_as_number*/
+  0,                           /*tp_as_sequence*/
+  0,                           /*tp_as_mapping*/
+  0,                           /*tp_hash*/
+  0,                           /*tp_call*/
+  0,                           /*tp_str*/
+  0,                           /*tp_getattro*/
+  _no_setattr,                 /*tp_setattro*/
+  0,                           /*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
+  "RE2 regexp objects",        /*tp_doc*/
+  0,                           /*tp_traverse*/
+  0,                           /*tp_clear*/
+  0,                           /*tp_richcompare*/
+  0,                           /*tp_weaklistoffset*/
+  0,                           /*tp_iter*/
+  0,                           /*tp_iternext*/
+  regexp_methods,              /*tp_methods*/
+  0,                           /*tp_members*/
+  0,                           /*tp_getset*/
+  0,                           /*tp_base*/
+  0,                           /*tp_dict*/
+  0,                           /*tp_descr_get*/
+  0,                           /*tp_descr_set*/
+  offsetof(RegexpObject2, attr_dict),  /*tp_dictoffset*/
+  0,                           /*tp_init*/
+  0,                           /*tp_alloc*/
+  0,                           /*tp_new*/
+};
+
+static PyTypeObject Match_Type2 = {
+  PyObject_HEAD_INIT(NULL)
+  0,                           /*ob_size*/
+  "_re2.RE2_Match",            /*tp_name*/
+  sizeof(MatchObject2),        /*tp_basicsize*/
+  0,                           /*tp_itemsize*/
+  (destructor)match_dealloc,   /*tp_dealloc*/
+  0,                           /*tp_print*/
+  0,                           /*tp_getattr*/
+  0,                           /*tp_setattr*/
+  0,                           /*tp_compare*/
+  0,                           /*tp_repr*/
+  0,                           /*tp_as_number*/
+  0,                           /*tp_as_sequence*/
+  0,                           /*tp_as_mapping*/
+  0,                           /*tp_hash*/
+  0,                           /*tp_call*/
+  0,                           /*tp_str*/
+  0,                           /*tp_getattro*/
+  _no_setattr,                 /*tp_setattro*/
+  0,                           /*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
+  "RE2 match objects",         /*tp_doc*/
+  0,                           /*tp_traverse*/
+  0,                           /*tp_clear*/
+  0,                           /*tp_richcompare*/
+  0,                           /*tp_weaklistoffset*/
+  0,                           /*tp_iter*/
+  0,                           /*tp_iternext*/
+  match_methods,               /*tp_methods*/
+  0,                           /*tp_members*/
+  0,                           /*tp_getset*/
+  0,                           /*tp_base*/
+  0,                           /*tp_dict*/
+  0,                           /*tp_descr_get*/
+  0,                           /*tp_descr_set*/
+  offsetof(MatchObject2, attr_dict),  /*tp_dictoffset*/
+  0,                           /*tp_init*/
+  0,                           /*tp_alloc*/
+  0,                           /*tp_new*/
+};
+
+
+static void
+regexp_dealloc(RegexpObject2* self)
+{
+  delete self->re2_obj;
+  Py_XDECREF(self->attr_dict);
+  PyObject_Del(self);
+}
+
+static PyObject*
+create_regexp(PyObject* pattern)
+{
+  RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2);
+  if (regexp == NULL) {
+    return NULL;
+  }
+  regexp->re2_obj = NULL;
+  regexp->attr_dict = NULL;
+
+  const char* raw_pattern = PyString_AS_STRING(pattern);
+  Py_ssize_t len_pattern = PyString_GET_SIZE(pattern);
+
+  RE2::Options options;
+  options.set_log_errors(false);
+
+  regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int) len_pattern), options);
+
+  if (regexp->re2_obj == NULL) {
+    PyErr_NoMemory();
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  if (!regexp->re2_obj->ok()) {
+    long code = (long)regexp->re2_obj->error_code();
+    const std::string& msg = regexp->re2_obj->error();
+    PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length());
+    if (value == NULL) {
+      Py_DECREF(regexp);
+      return NULL;
+    }
+    PyErr_SetObject(error_class, value);
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  PyObject* groupindex = PyDict_New();
+  if (groupindex == NULL) {
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  // Build up the attr_dict early so regexp can take ownership of our reference
+  // to groupindex.
+  regexp->attr_dict = Py_BuildValue("{sisNsO}",
+      "groups", regexp->re2_obj->NumberOfCapturingGroups(),
+      "groupindex", groupindex,
+      "pattern", pattern);
+  if (regexp->attr_dict == NULL) {
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  const std::map<std::string, int>& name_map = regexp->re2_obj->NamedCapturingGroups();
+  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
+    PyObject* index = PyInt_FromLong(it->second);
+    if (index == NULL) {
+      Py_DECREF(regexp);
+      return NULL;
+    }
+    int res = PyDict_SetItemString(groupindex, it->first.c_str(), index);
+    Py_DECREF(index);
+    if (res < 0) {
+      Py_DECREF(regexp);
+      return NULL;
+    }
+  }
+
+  return (PyObject*)regexp;
+}
+
+static PyObject*
+_do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match)
+{
+  PyObject* string;
+  const char* subject;
+  Py_ssize_t slen;
+  long pos = 0;
+  long endpos = LONG_MAX;
+
+  static const char* kwlist[] = {
+    "string",
+    "pos",
+    "endpos",
+    NULL};
+
+  // Using O! instead of s# here, because we want to stash the original
+  // PyObject* in the match object on a successful match.
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!|ll", (char**)kwlist,
+        &PyString_Type, &string,
+        &pos, &endpos)) {
+    return NULL;
+  }
+
+  subject = PyString_AS_STRING(string);
+  slen = PyString_GET_SIZE(string);
+  if (pos < 0) pos = 0;
+  if (pos > slen) pos = slen;
+  if (endpos < pos) endpos = pos;
+  if (endpos > slen) endpos = slen;
+
+  // Don't bother allocating these if we are just doing a test.
+  int n_groups = 0;
+  StringPiece* groups = NULL;
+  if (return_match) {
+    n_groups = self->re2_obj->NumberOfCapturingGroups() + 1;
+    groups = new(nothrow) StringPiece[n_groups];
+
+    if (groups == NULL) {
+      PyErr_NoMemory();
+      return NULL;
+    }
+  }
+
+  bool matched = self->re2_obj->Match(
+      StringPiece(subject, (int) slen),
+      (int) pos,
+      (int) endpos,
+      anchor,
+      groups,
+      n_groups);
+
+  if (!return_match) {
+    if (matched) {
+      Py_RETURN_TRUE;
+    }
+    Py_RETURN_FALSE;
+  }
+
+  if (!matched) {
+    delete[] groups;
+    Py_RETURN_NONE;
+  }
+
+  // create_match is going to Py_BuildValue the pos and endpos into
+  // PyObjects.  We could optimize the case where pos and/or endpos were
+  // explicitly passed in by forwarding the existing PyObjects.
+  // That requires much more intricate code, though.
+  return create_match((PyObject*)self, string, pos, endpos, groups);
+}
+
+static PyObject*
+regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::UNANCHORED, true);
+}
+
+static PyObject*
+regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_START, true);
+}
+
+static PyObject*
+regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true);
+}
+
+static PyObject*
+regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::UNANCHORED, false);
+}
+
+static PyObject*
+regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_START, false);
+}
+
+static PyObject*
+regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false);
+}
+
+
+static void
+match_dealloc(MatchObject2* self)
+{
+  delete[] self->groups;
+  Py_XDECREF(self->attr_dict);
+  PyObject_Del(self);
+}
+
+static PyObject*
+create_match(PyObject* re, PyObject* string,
+    long pos, long endpos,
+    StringPiece* groups)
+{
+  MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
+  if (match == NULL) {
+    delete[] groups;
+    return NULL;
+  }
+  match->attr_dict = NULL;
+  match->groups = groups;
+  match->re = re;
+  match->string = string;
+
+  match->attr_dict = Py_BuildValue("{sOsOslsl}",
+      "re", re,
+      "string", string,
+      "pos", pos,
+      "endpos", endpos);
+  if (match->attr_dict == NULL) {
+    Py_DECREF(match);
+    return NULL;
+  }
+
+  return (PyObject*)match;
+}
+
+/**
+ * Attempt to convert an untrusted group index (PyObject* group) into
+ * a trusted one (*idx_p).  Return false on failure (exception).
+ */
+static bool
+_group_idx(MatchObject2* self, PyObject* group, long* idx_p)
+{
+  if (group == NULL) {
+    return false;
+  }
+  PyErr_Clear(); // Is this necessary?
+  long idx = PyInt_AsLong(group);
+  if (idx == -1 && PyErr_Occurred() != NULL) {
+    return false;
+  }
+  // TODO: Consider caching NumberOfCapturingGroups.
+  if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) {
+    PyErr_SetString(PyExc_IndexError, "no such group");
+    return false;
+  }
+  *idx_p = idx;
+  return true;
+}
+
+/**
+ * Extract the start and end indexes of a pre-checked group number.
+ * Sets both to -1 if it did not participate in the match.
+ */
+static bool
+_group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end)
+{
+  // "idx" is expected to be verified.
+  StringPiece& piece = self->groups[idx];
+  if (piece.data() == NULL) {
+    *o_start = -1;
+    *o_end = -1;
+    return false;
+  }
+  Py_ssize_t start = piece.data() - PyString_AS_STRING(self->string);
+  *o_start = start;
+  *o_end = start + piece.length();
+  return true;
+}
+
+/**
+ * Return a pre-checked group number as a string, or default_obj
+ * if it didn't participate in the match.
+ */
+static PyObject*
+_group_get_i(MatchObject2* self, long idx, PyObject* default_obj)
+{
+  Py_ssize_t start;
+  Py_ssize_t end;
+  if (!_group_span(self, idx, &start, &end)) {
+    Py_INCREF(default_obj);
+    return default_obj;
+  }
+  return PySequence_GetSlice(self->string, start, end);
+}
+
+/**
+ * Return n un-checked group number as a string.
+ */
+static PyObject*
+_group_get_o(MatchObject2* self, PyObject* group)
+{
+  long idx;
+  if (!_group_idx(self, group, &idx)) {
+    return NULL;
+  }
+  return _group_get_i(self, idx, Py_None);
+}
+
+
+static PyObject*
+match_group(MatchObject2* self, PyObject* args)
+{
+  long idx = 0;
+  Py_ssize_t nargs = PyTuple_GET_SIZE(args);
+  switch (nargs) {
+    case 1:
+      if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) {
+        return NULL;
+      }
+      // Fall through.
+    case 0:
+      return _group_get_i(self, idx, Py_None);
+    default:
+      PyObject* ret = PyTuple_New(nargs);
+      if (ret == NULL) {
+        return NULL;
+      }
+
+      for (int i = 0; i < nargs; i++) {
+        PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i));
+        if (group == NULL) {
+          Py_DECREF(ret);
+          return NULL;
+        }
+        PyTuple_SET_ITEM(ret, i, group);
+      }
+      return ret;
+  }
+}
+
+static PyObject*
+match_groups(MatchObject2* self, PyObject* args, PyObject* kwds)
+{
+  static const char* kwlist[] = {
+    "default",
+    NULL};
+
+  PyObject* default_obj = Py_None;
+
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
+        &default_obj)) {
+    return NULL;
+  }
+
+  int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups();
+
+  PyObject* ret = PyTuple_New(ngroups);
+  if (ret == NULL) {
+    return NULL;
+  }
+
+  for (int i = 1; i <= ngroups; i++) {
+    PyObject* group = _group_get_i(self, i, default_obj);
+    if (group == NULL) {
+      Py_DECREF(ret);
+      return NULL;
+    }
+    PyTuple_SET_ITEM(ret, i-1, group);
+  }
+
+  return ret;
+}
+
+static PyObject*
+match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds)
+{
+  static const char* kwlist[] = {
+    "default",
+    NULL};
+
+  PyObject* default_obj = Py_None;
+
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
+        &default_obj)) {
+    return NULL;
+  }
+
+  PyObject* ret = PyDict_New();
+  if (ret == NULL) {
+    return NULL;
+  }
+
+  const std::map<std::string, int>& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups();
+  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
+    PyObject* group = _group_get_i(self, it->second, default_obj);
+    if (group == NULL) {
+      Py_DECREF(ret);
+      return NULL;
+    }
+    // TODO: Group names with embedded zeroes?
+    int res = PyDict_SetItemString(ret, it->first.data(), group);
+    Py_DECREF(group);
+    if (res < 0) {
+      Py_DECREF(ret);
+      return NULL;
+    }
+  }
+
+  return ret;
+}
+
+enum span_mode_t { START, END, SPAN };
+
+static PyObject*
+_do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode)
+{
+  long idx = 0;
+  PyObject* group = NULL;
+  if (!PyArg_UnpackTuple(args, name, 0, 1,
+        &group)) {
+    return NULL;
+  }
+  if (group != NULL) {
+    if (!_group_idx(self, group, &idx)) {
+      return NULL;
+    }
+  }
+
+  Py_ssize_t start = - 1;
+  Py_ssize_t end = - 1;
+
+  (void)_group_span(self, idx, &start, &end);
+  switch (mode) {
+    case START : return Py_BuildValue("n", start );
+    case END   : return Py_BuildValue("n", end   );
+    case SPAN:
+      return Py_BuildValue("nn", start, end);
+  }
+
+  // Make gcc happy.
+  return NULL;
+}
+
+static PyObject*
+match_start(MatchObject2* self, PyObject* args)
+{
+  return _do_span(self, args, "start", START);
+}
+
+static PyObject*
+match_end(MatchObject2* self, PyObject* args)
+{
+  return _do_span(self, args, "end", END);
+}
+
+static PyObject*
+match_span(MatchObject2* self, PyObject* args)
+{
+  return _do_span(self, args, "span", SPAN);
+}
+
+
+static PyObject*
+_compile(PyObject* self, PyObject* args, PyObject* kwds)
+{
+  static const char* kwlist[] = {
+    "pattern",
+    NULL};
+
+  PyObject* pattern;
+
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O", (char**)kwlist,
+        &pattern)) {
+    return NULL;
+  }
+
+  return create_regexp(pattern);
+}
+
+static PyObject*
+escape(PyObject* self, PyObject* args)
+{
+  char *str;
+  Py_ssize_t len;
+
+  if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) {
+    return NULL;
+  }
+
+  std::string esc(RE2::QuoteMeta(StringPiece(str, (int) len)));
+
+  return PyString_FromStringAndSize(esc.c_str(), esc.size());
+}
+
+static PyMethodDef methods[] = {
+  {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL},
+  {"escape", (PyCFunction)escape, METH_VARARGS,
+   "Escape all potentially meaningful regexp characters."},
+  {NULL}  /* Sentinel */
+};
+
+PyMODINIT_FUNC
+init_re2(void)
+{
+  if (PyType_Ready(&Regexp_Type2) < 0) {
+    return;
+  }
+
+  if (PyType_Ready(&Match_Type2) < 0) {
+    return;
+  }
+
+  PyObject* sre_mod = PyImport_ImportModuleNoBlock("sre_constants");
+  if (sre_mod == NULL) {
+    return;
+  }
+  /* static global */ error_class = PyObject_GetAttrString(sre_mod, "error");
+  if (error_class == NULL) {
+    return;
+  }
+
+  PyObject* mod = Py_InitModule("_re2", methods);
+
+  Py_INCREF(error_class);
+  PyModule_AddObject(mod, "error", error_class);
+}
diff --git a/mercurial/re2.py b/mercurial/re2.py
new file mode 100644
--- /dev/null
+++ b/mercurial/re2.py
@@ -0,0 +1,63 @@
+#!/usr/bin/env python
+
+# Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in the
+#   documentation and/or other materials provided with the distribution.
+# * Neither the name of Facebook nor the names of its contributors
+#   may be used to endorse or promote products derived from this software
+#   without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import _re2
+
+__all__ = [
+    "error",
+    "escape",
+    "compile",
+    "search",
+    "match",
+    "fullmatch",
+    ]
+
+# Module-private compilation function, for future caching, other enhancements
+_compile = _re2._compile
+
+error = _re2.error
+escape = _re2.escape
+
+def compile(pattern):
+    "Compile a regular expression pattern, returning a pattern object."
+    return _compile(pattern)
+
+def search(pattern, string):
+    """Scan through string looking for a match to the pattern, returning
+    a match object, or None if no match was found."""
+    return _compile(pattern).search(string)
+
+def match(pattern, string):
+    """Try to apply the pattern at the start of the string, returning
+    a match object, or None if no match was found."""
+    return _compile(pattern).match(string)
+
+def fullmatch(pattern, string):
+    """Try to apply the pattern to the entire string, returning
+    a match object, or None if no match was found."""
+    return _compile(pattern).fullmatch(string)


More information about the Mercurial-devel mailing list