[PATCH 1 of 6] revset: make tokenize extensible to parse alias declarations and definitions
FUJIWARA Katsunori
foozy at lares.dti.ne.jp
Thu Jan 8 10:37:09 UTC 2015
# HG changeset patch
# User FUJIWARA Katsunori <foozy at lares.dti.ne.jp>
# Date 1420712143 -32400
# Thu Jan 08 19:15:43 2015 +0900
# Node ID 497901f26ace28449cc356e9e4638ecb04d4f11a
# Parent 7ad155e13f0f51df8e986a0ec4e58ac9a0ccedbb
revset: make tokenize extensible to parse alias declarations and definitions
Before this patch, "tokenize" doesn't recognize the symbol starting
with "$" as a valid one.
This prevents revset alias declarations and definitions from being
parsed with "tokenize", because "$" may be used as the initial letter
of alias arguments.
BTW, the alias argument name doesn't require leading "$" itself, in
fact. But we have to assume that users may use "$" as the initial
letter of argument names in their aliases, because examples in "hg
help revsets" uses such names for a long time.
To make "tokenize" extensible to parse alias declarations and
definitions, this patch introduces optional arguments "syminitletter"
and "symletter". Giving these functions can change the policy of
"valid symbol" in tokenization easily.
This is a part of preparation for parsing alias declarations and
definitions more strictly.
diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -128,10 +128,24 @@
keywords = set(['and', 'or', 'not'])
-def tokenize(program, lookup=None):
+def tokenize(program, lookup=None, syminitletter=None, symletter=None):
'''
Parse a revset statement into a stream of tokens
+ ``syminitletter`` is the function ``func(c)``, which takes an
+ argument as a character to be examined and returns whether it can
+ be an initial letter of the valid symbol or not.
+
+ By default, character ``c`` is recognized as an initial letter of the
+ valid symbol, if ``c.isalnum() or c in '._@' or ord(c) > 127``.
+
+ ``symletter`` is the function ``func(c)``, which takes an argument
+ as a character to be examined and returns whether it can be
+ non-initial letter of the valid symbol or not.
+
+ By default, character ``c`` is recognized as a non-initial letter of
+ the valid symbol, if ``c.isalnum() or c in '._/@' or ord(c) > 127``.
+
Check that @ is a valid unquoted token character (issue3686):
>>> list(tokenize("@::"))
[('symbol', '@', 0), ('::', None, 1), ('end', None, 3)]
@@ -139,6 +153,10 @@
'''
pos, l = 0, len(program)
+ if not syminitletter:
+ syminitletter = lambda c: c.isalnum() or c in '._@' or ord(c) > 127
+ if not symletter:
+ symletter = lambda i, c: c.isalnum() or c in "-._/@" or ord(c) > 127
while pos < l:
c = program[pos]
if c.isspace(): # skip inter-token whitespace
@@ -176,12 +194,12 @@
else:
raise error.ParseError(_("unterminated string"), s)
# gather up a symbol/keyword
- elif c.isalnum() or c in '._@' or ord(c) > 127:
+ elif syminitletter(c):
s = pos
pos += 1
while pos < l: # find end of symbol
d = program[pos]
- if not (d.isalnum() or d in "-._/@" or ord(d) > 127):
+ if not symletter(c, d):
break
if d == '.' and program[pos - 1] == '.': # special case for ..
pos -= 1
More information about the Mercurial-devel
mailing list