[patch] syntax:plain for .hgignore

Jonathan S. Shapiro shap at eros-os.com
Wed Sep 12 12:50:23 CDT 2007

On Wed, 2007-09-12 at 10:27 -0500, Matt Mackall wrote:
>             # We're using a Python with a tiny regex engine and we             
>             # made it explode, so we'll divide the pattern list in two         
>             # until it works                                                   
>             l = len(pats)
>             if l < 2:
>                 raise
>             a, b = matchfn(pats[:l/2], tail), matchfn(pats[l/2:], tail)
>             return lambda s: a(s) or b(s)
> Each split means an extra regex test, so it won't take many splits to
> be slower than string matching.

Ahh. Okay, this makes a lot of (reluctant) sense. I remain very confused
about Guido's case, however, because I can't see how a 620
line .hgignore generated 5000 calls to sre_compile(). Assuming it had to
split down to individual lines, it shouldn't have been more than 1670
calls. I am still failing to understand something here.

Concerning the loop itself, I have a naive question. The algorithm finds
a workable length l within log(n) steps, but having done so it doesn't
re-use this discovery after the recursion backs out of the current
subtree. It seems to me that once a workable l is found, no larger value
of l should ever be tried.

But the main thing is the 5000 calls to sre.compile!

Is it possible that the culprit is the call to matchfn() around
util.py:467 that is building filematch? If he has 19,813 copies in his
working tree, I can easily see him making a whole slew of calls to
re.match re-converging on 'l' redundantly.

Any reason not to just rename l to something more descriptive and make
it module-global? It does not seem likely that the RE library will
become spontaneously less broken during any given execution of the hg
command. :-)

Jonathan S. Shapiro
Managing Director
The EROS Group, LLC
www.coyotos.org, www.eros-os.org

More information about the Mercurial-devel mailing list