D3839: stringutil: add a new function to do minimal regex escaping

durin42 (Augie Fackler) phabricator at mercurial-scm.org
Tue Jun 26 15:21:24 UTC 2018


durin42 created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Per https://bugs.python.org/issue29995, re.escape() used to
  over-escape regular expression strings, but in Python 3.7 that's been
  fixed, which also improved the performance of re.escape(). Since it's
  both an output change for us *and* a perfomance win, let's just
  effectively backport the new behavior to hg on all Python versions.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3839

AFFECTED FILES
  mercurial/utils/stringutil.py

CHANGE DETAILS

diff --git a/mercurial/utils/stringutil.py b/mercurial/utils/stringutil.py
--- a/mercurial/utils/stringutil.py
+++ b/mercurial/utils/stringutil.py
@@ -23,6 +23,25 @@
     pycompat,
 )
 
+# regex special chars pulled from https://bugs.python.org/issue29995
+# which was part of Python 3.7.
+_respecial = pycompat.bytestr(b'()[]{}?*+-|^$\\.# \t\n\r\v\f')
+_regexescapemap = {ord(i): (b'\\' + i).decode('latin1') for i in _respecial}
+
+def reescape(pat):
+    """Drop-in replacement for re.escape."""
+    # NOTE: it is intentional that this works on unicodes and not
+    # bytes, as it's only possible to do the escaping with
+    # unicode.translate, not bytes.translate. Sigh.
+    wantuni = True
+    if isinstance(pat, bytes):
+        wantuni = False
+        pat = pat.decode('latin1')
+    pat = pat.translate(_regexescapemap)
+    if wantuni:
+        return pat
+    return pat.encode('latin1')
+
 def pprint(o, bprefix=False):
     """Pretty print an object."""
     if isinstance(o, bytes):



To: durin42, #hg-reviewers
Cc: mercurial-devel


More information about the Mercurial-devel mailing list