RFC: safe pattern matching for problematic encoding

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Thu May 24 11:23:02 CDT 2012


At Wed, 23 May 2012 13:56:43 -0500,
Matt Mackall wrote:
> 
> On Wed, 2012-05-23 at 21:38 +0900, FUJIWARA Katsunori wrote:
> > Hi, devels.
> > 
> > I'm working to achieve safe pattern matching/parsing for problematic
> > encodings (e.g.: cp932), in which strings may contain '\\' as a part
> > of multi-byte characters.
> 
> Please provide an example of where we'd want this for discussion.

We need such safeness in situations below:

  - for file/directory patterns of "hg status", "hg log" and so on:
    (path, globbing or regex)

      in this case, backslashes in patterns are skipped, because they
      are recognized as an escape character of next by "_globre()" in
      "match.py".

      this causes unexpected matching result: "_globre()" doesn't
      raise exception, even though specified pattern is ended by
      backslash of MBCS.


  - for regexp patterns of "hg grep":

      in this case, backslashes in patterns are skipped, because they
      are recognized as an escape character of next by "re.compile()".

      this causes unexpected matching result (MBCS is in the middle of
      the pattern), or parse error (in the tail of the pattern)


  - for arguments of revsets/filesets predicates:
    (pathes, regexp, keywords and so on)

  - for strings of styles/templates:

      in these cases, backslashes in patterns are skipped, because
      they are recognized as an escape character of next by:

        - "tokenize()" in "fileset.py"
        - "tokenize()" in "revset.py"
        - "tokenize()" in "templater.py"

      this causes unexpected matching result (MBCS is in the middle of
      the argument), or parse error (in the tail of the argument)


even though safeness in "strings for styles/templates" situation is
not needed so seriously.

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list