[PATCH 1 of 6] revset: make tokenize extensible to parse alias declarations and definitions

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Thu Jan 8 10:37:09 UTC 2015


# HG changeset patch
# User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
# Date 1420712143 -32400
#      Thu Jan 08 19:15:43 2015 +0900
# Node ID 497901f26ace28449cc356e9e4638ecb04d4f11a
# Parent  7ad155e13f0f51df8e986a0ec4e58ac9a0ccedbb
revset: make tokenize extensible to parse alias declarations and definitions

Before this patch, "tokenize" doesn't recognize the symbol starting
with "$" as a valid one.

This prevents revset alias declarations and definitions from being
parsed with "tokenize", because "$" may be used as the initial letter
of alias arguments.

BTW, the alias argument name doesn't require leading "$" itself, in
fact. But we have to assume that users may use "$" as the initial
letter of argument names in their aliases, because examples in "hg
help revsets" uses such names for a long time.

To make "tokenize" extensible to parse alias declarations and
definitions, this patch introduces optional arguments "syminitletter"
and "symletter". Giving these functions can change the policy of
"valid symbol" in tokenization easily.

This is a part of preparation for parsing alias declarations and
definitions more strictly.

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -128,10 +128,24 @@
 
 keywords = set(['and', 'or', 'not'])
 
-def tokenize(program, lookup=None):
+def tokenize(program, lookup=None, syminitletter=None, symletter=None):
     '''
     Parse a revset statement into a stream of tokens
 
+    ``syminitletter`` is the function ``func(c)``, which takes an
+    argument as a character to be examined and returns whether it can
+    be an initial letter of the valid symbol or not.
+
+    By default, character ``c`` is recognized as an initial letter of the
+    valid symbol, if ``c.isalnum() or c in '._@' or ord(c) > 127``.
+
+    ``symletter`` is the function ``func(c)``, which takes an argument
+    as a character to be examined and returns whether it can be
+    non-initial letter of the valid symbol or not.
+
+    By default, character ``c`` is recognized as a non-initial letter of
+    the valid symbol, if ``c.isalnum() or c in '._/@' or ord(c) > 127``.
+
     Check that @ is a valid unquoted token character (issue3686):
     >>> list(tokenize("@::"))
     [('symbol', '@', 0), ('::', None, 1), ('end', None, 3)]
@@ -139,6 +153,10 @@
     '''
 
     pos, l = 0, len(program)
+    if not syminitletter:
+        syminitletter = lambda c: c.isalnum() or c in '._@' or ord(c) > 127
+    if not symletter:
+        symletter = lambda i, c: c.isalnum() or c in "-._/@" or ord(c) > 127
     while pos < l:
         c = program[pos]
         if c.isspace(): # skip inter-token whitespace
@@ -176,12 +194,12 @@
             else:
                 raise error.ParseError(_("unterminated string"), s)
         # gather up a symbol/keyword
-        elif c.isalnum() or c in '._@' or ord(c) > 127:
+        elif syminitletter(c):
             s = pos
             pos += 1
             while pos < l: # find end of symbol
                 d = program[pos]
-                if not (d.isalnum() or d in "-._/@" or ord(d) > 127):
+                if not symletter(c, d):
                     break
                 if d == '.' and program[pos - 1] == '.': # special case for ..
                     pos -= 1


More information about the Mercurial-devel mailing list