[patch] syntax:plain for .hgignore

Matt Mackall mpm at selenic.com
Tue Sep 11 15:26:16 CDT 2007


On Tue, Sep 11, 2007 at 07:44:27PM +0200, Guido Ostkamp wrote:
> On Tue, 11 Sep 2007, Patrick M?zard wrote:
> >Great.
> >
> >A couple of runs with --lsprof (if python 2.5 or if the module is 
> >available) or --profile might be interesting to compare to.
> 
> ok, here it is (I hope I made no mistakes capturing this).
> Keep in mind that even in 'syntax: plain' case, there are still some glob
> expressions left.
> 
> Original mercurial:
> 
> Full repo
> 
>     CallCount     Total(s)    Inline(s) module:lineno(function)
>          5874      1.7714      1.7714   <built-in method match>

Ok, looks like we called our compiled pattern 5874 times?

>             1      2.8697      0.2593   mercurial.dirstate:387(findfiles)
>        +16760      0.1890      0.1890   +<posix.lstat>
>        +33521      0.2433      0.1776   +posixpath:56(join)
>         +1447      0.1387      0.1387   +<posix.listdir>
>        +15292      1.0065      0.0511   +mercurial.dirstate:359(imatch)
>        +16760      0.0456      0.0317   +stat:45(S_ISDIR)
>          3116      0.6329      0.2147   sre_parse:385(_parse)
>        +26688      0.1860      0.0536   +sre_parse:207(get)
>         +1247      0.6328      0.0466   +sre_parse:307(_parse_sub)
>        +26004      0.0598      0.0457   +sre_parse:144(append)
>         +3402      0.0306      0.0234   +sre_parse:263(_escape)
>         +5671      0.0284      0.0104   +sre_parse:201(match)
>         22245      0.2528      0.2044   re:196(escape)

That's a lot of escaping.

>        +22245      0.0234      0.0234   +<range>
>        +22245      0.0126      0.0126   +<len>
>        +22245      0.0123      0.0123   +<method 'join' of 'str' objects>
>          5006      0.2772      0.2042   sre_compile:38(_compile)
>         +5005      0.2772      0.2042   +sre_compile:38(_compile)

And called the compiler 5000 times? That seems odd.

>        +67060      0.0316      0.0316   +<method 'append' of 'list' objects>
>        +32898      0.0288      0.0288   +sre_parse:136(__getitem__)
>        +13746      0.0069      0.0069   +<len>
>          +643      0.0056      0.0042   +sre_compile:360(_simple)
>         16761      0.1891      0.1891   <posix.lstat>
>         33525      0.2433      0.1776   posixpath:56(join)
>        +33525      0.0374      0.0374   +<method 'startswith' of 'str' 
>        objects>
>        +33520      0.0283      0.0283   +<method 'endswith' of 'str' 
>        objects>
>          1447      0.1387      0.1387   <posix.listdir>
>         32319      0.1595      0.1282   sre_parse:188(__next)
>        +64635      0.0312      0.0312   +<len>
>          5649      0.1268      0.1087   sre_parse:146(getwidth)
>         +5005      0.1259      0.1078   +sre_parse:146(getwidth)
>        +13127      0.0156      0.0156   +<min>
>         +3115      0.0025      0.0025   +<max>
> 
> Empty repo
> 
>     CallCount     Total(s)    Inline(s) module:lineno(function)
>         22245      0.2563      0.2084   re:196(escape)

Hmmm, escaping is constant. Looks like the bits in util.globre.

>        +22245      0.0229      0.0229   +<range>
>        +22245      0.0125      0.0125   +<len>
>        +22245      0.0124      0.0124   +<method 'join' of 'str' objects>
>          5006      0.2793      0.2052   sre_compile:38(_compile)

And compile count is constant. So why 5000 calls?

> Mercurial with 'style:plain' patch:
> 
> Full repo
> 
>     CallCount     Total(s)    Inline(s) module:lineno(function)
>             1      1.3375      0.2598   mercurial.dirstate:387(findfiles)
>        +16760      0.1875      0.1875   +<posix.lstat>
>        +33521      0.2523      0.1864   +posixpath:56(join)
>         +1447      0.0528      0.0528   +<posix.listdir>
>        +15292      0.2977      0.0509   +mercurial.dirstate:359(imatch)
>         +1468      0.0973      0.0453   +mercurial.util:480(<lambda>)
>         16761      0.1875      0.1875   <posix.lstat>
>         33525      0.2524      0.1864   posixpath:56(join)
>        +33525      0.0375      0.0375   +<method 'startswith' of 'str' 
>        objects>
>        +33520      0.0284      0.0284   +<method 'endswith' of 'str' 
>        objects>
>          5874      0.3354      0.1684   mercurial.util:480(<lambda>)

There's our 5874 again.

>         +5293      0.1671      0.1671   +<built-in method match>
>          5293      0.1671      0.1671   <built-in method match>
>             1      1.7405      0.1035   mercurial.dirstate:477(status)
>        +11551      1.5218      0.0404   +mercurial.dirstate:335(statwalk)
>          +664      0.0003      0.0003   +<method 'append' of 'list' objects>
>            +1      0.1149      0.0001   +mercurial.dirstate:27(__getattr__)
>             1      0.1147      0.0758   mercurial.dirstate:124(_read)
>        +10886      0.0296      0.0196   +struct:77(unpack)
>        +10887      0.0051      0.0051   +<len>
>            +1      0.0031      0.0031   +<method 'read' of 'file' objects>
>            +1      0.0011      0.0000   
>            +mercurial.demandimport:71(__getattribute
> __)
>            +1      0.0000      0.0000   +mercurial.util:1334(__call__)
>          1447      0.0528      0.0528   <posix.listdir>
>         15292      0.2977      0.0509   mercurial.dirstate:359(imatch)
>         +4406      0.2382      0.1230   +mercurial.util:480(<lambda>)
>        +11550      0.0086      0.0086   +mercurial.util:252(always)
>          1458      0.0472      0.0472   <method 'sort' of 'list' objects>
> 
> Empty repo
> 
>     CallCount     Total(s)    Inline(s) module:lineno(function)
>           626      0.0212      0.0199   mercurial.ignore:11(_parselines)
>          +628      0.0006      0.0006   +<method 'endswith' of 'str' 
>          objects>
>          +628      0.0004      0.0004   +<len>
>          +628      0.0003      0.0003   +<method 'rstrip' of 'str' objects>
>           625      0.0199      0.0155   posixpath:373(normpath)
>         +3580      0.0017      0.0017   +<method 'append' of 'list' objects>
>          +625      0.0011      0.0011   +<method 'split' of 'str' objects>
>          +625      0.0010      0.0010   +<method 'join' of 'str' objects>
>          +625      0.0007      0.0007   +<method 'startswith' of 'str' 
>          objects>
>             1      0.1103      0.0097   mercurial.ignore:24(ignore)
>          +626      0.0212      0.0199   +mercurial.ignore:11(_parselines)
>         +5609      0.0039      0.0039   +<method 'startswith' of 'str' 
>         objects>
>          +623      0.0006      0.0006   +<method 'items' of 'dict' objects>
>          +623      0.0003      0.0003   +<method 'append' of 'list' objects>
>            +1      0.0745      0.0001   +mercurial.util:406(matcher)
>           246      0.0097      0.0072   sre_compile:38(_compile)
>          +245      0.0097      0.0072   +sre_compile:38(_compile)
>         +1910      0.0009      0.0009   +<method 'append' of 'list' objects>
>          +918      0.0009      0.0009   +sre_parse:136(__getitem__)
>          +656      0.0003      0.0003   +<len>
>           +48      0.0004      0.0003   +sre_compile:360(_simple)
>           141      0.0198      0.0066   sre_parse:385(_parse)
>           +57      0.0197      0.0020   +sre_parse:307(_parse_sub)
>          +658      0.0046      0.0013   +sre_parse:207(get)
>          +569      0.0013      0.0010   +sre_parse:144(append)
>          +316      0.0014      0.0005   +sre_parse:201(match)
>           +65      0.0006      0.0005   +sre_parse:263(_escape)
>          8141      0.0062      0.0062   <method 'startswith' of 'str' 
>          objects>
>             2      0.0319      0.0045   mercurial.util:510(normalizepats)
>          +623      0.0199      0.0155   +posixpath:373(normpath)
>          +623      0.0069      0.0039   +mercurial.util:271(patkind)
>         +1246      0.0006      0.0006   +<method 'append' of 'list' objects>
>           294      0.0050      0.0041   sre_parse:146(getwidth)
>          +245      0.0049      0.0041   +sre_parse:146(getwidth)
>          +632      0.0008      0.0008   +<min>
>          +140      0.0001      0.0001   +<max>
>          8175      0.0041      0.0041   <method 'append' of 'list' objects>
>           623      0.0069      0.0039   mercurial.util:271(patkind)
>         +1897      0.0016      0.0016   +<method 'startswith' of 'str' 
>         objects>
>          +623      0.0013      0.0013   +<method 'split' of 'str' objects>

Something strikes me as odd here. I wrote a quick little test:

----
import re, random, sys

def randhexstring():
    return "".join(["%04x" % random.randint(0, 0xffff) for x in range(20)])

pats = [randhexstring() for x in range(1000)]

def rematch(pats):
    pat = '(?:%s)' % '|'.join(pats)
    return re.compile(pat).match

def fixmatch(pats):
    def match(s):
        return s in pats
    return match

if "re" in sys.argv:
    m = rematch(pats)
else:
    m = fixmatch(pats)

if "exit" in sys.argv:
    sys.exit(0)

count = 0
for s in pats * 100:
    if m(s):
        count += 1

print count
----

So we generate 1000 80-character strings, and either compile them into
a regex or just do a straight match, then we match 100000 times.

Building the table takes:

$ time python retest.py exit

real    0m0.260s
user    0m0.248s
sys     0m0.004s

Compiling the giant regex takes:

$ time python2.5 retest.py re exit

real    0m0.806s
user    0m0.796s
sys     0m0.008s

And matching takes:

$ time python2.5 retest.py re
100000

real    0m1.851s
user    0m1.832s
sys     0m0.012s

And doing straight string matching takes:

$ time python2.5 retest.py
100000

real    0m2.464s
user    0m2.452s
sys     0m0.004s

That's ~1s for regex matching vs ~2.2s for string matching once we
take out building the patterns and compiling. Which is as expected - a
regex can test an input against all the possible ignore strings in one
pass through its DFA, aborting as soon as it's impossible for the
input to match.

On the other hand, if your regex is too large for your Python build
and it has to get broken into pieces, then regexes will probably lose.

-- 
Mathematics is the supreme nostalgia of our time.


More information about the Mercurial-devel mailing list