[patch] syntax:plain for .hgignore

Matt Mackall mpm at selenic.com
Wed Sep 12 13:10:05 CDT 2007


On Wed, Sep 12, 2007 at 01:50:23PM -0400, Jonathan S. Shapiro wrote:
> On Wed, 2007-09-12 at 10:27 -0500, Matt Mackall wrote:
> >             # We're using a Python with a tiny regex engine and we             
> >             # made it explode, so we'll divide the pattern list in two         
> >             # until it works                                                   
> >             l = len(pats)
> >             if l < 2:
> >                 raise
> >             a, b = matchfn(pats[:l/2], tail), matchfn(pats[l/2:], tail)
> >             return lambda s: a(s) or b(s)
> > 
> > Each split means an extra regex test, so it won't take many splits to
> > be slower than string matching.
> 
> Ahh. Okay, this makes a lot of (reluctant) sense. I remain very confused
> about Guido's case, however, because I can't see how a 620
> line .hgignore generated 5000 calls to sre_compile(). Assuming it had to
> split down to individual lines, it shouldn't have been more than 1670
> calls. I am still failing to understand something here.
> 
> Concerning the loop itself, I have a naive question. The algorithm finds
> a workable length l within log(n) steps, but having done so it doesn't
> re-use this discovery after the recursion backs out of the current
> subtree. It seems to me that once a workable l is found, no larger value
> of l should ever be tried.

Trouble is, the limiting factor is not the number of lines, but the
number of graph nodes in the resulting NFA. So without parsing the
regex ourselves, we really don't know whether a given set of lines
will overflow the engine until we try.
 
> But the main thing is the 5000 calls to sre.compile!
> 
> Is it possible that the culprit is the call to matchfn() around
> util.py:467 that is building filematch?

Which? This?

    patmatch = matchfn(pats, '$') or always

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial-devel mailing list