[patch] syntax:plain for .hgignore
Matt Mackall
mpm at selenic.com
Tue Sep 11 15:26:16 CDT 2007
On Tue, Sep 11, 2007 at 07:44:27PM +0200, Guido Ostkamp wrote:
> On Tue, 11 Sep 2007, Patrick M?zard wrote:
> >Great.
> >
> >A couple of runs with --lsprof (if python 2.5 or if the module is
> >available) or --profile might be interesting to compare to.
>
> ok, here it is (I hope I made no mistakes capturing this).
> Keep in mind that even in 'syntax: plain' case, there are still some glob
> expressions left.
>
> Original mercurial:
>
> Full repo
>
> CallCount Total(s) Inline(s) module:lineno(function)
> 5874 1.7714 1.7714 <built-in method match>
Ok, looks like we called our compiled pattern 5874 times?
> 1 2.8697 0.2593 mercurial.dirstate:387(findfiles)
> +16760 0.1890 0.1890 +<posix.lstat>
> +33521 0.2433 0.1776 +posixpath:56(join)
> +1447 0.1387 0.1387 +<posix.listdir>
> +15292 1.0065 0.0511 +mercurial.dirstate:359(imatch)
> +16760 0.0456 0.0317 +stat:45(S_ISDIR)
> 3116 0.6329 0.2147 sre_parse:385(_parse)
> +26688 0.1860 0.0536 +sre_parse:207(get)
> +1247 0.6328 0.0466 +sre_parse:307(_parse_sub)
> +26004 0.0598 0.0457 +sre_parse:144(append)
> +3402 0.0306 0.0234 +sre_parse:263(_escape)
> +5671 0.0284 0.0104 +sre_parse:201(match)
> 22245 0.2528 0.2044 re:196(escape)
That's a lot of escaping.
> +22245 0.0234 0.0234 +<range>
> +22245 0.0126 0.0126 +<len>
> +22245 0.0123 0.0123 +<method 'join' of 'str' objects>
> 5006 0.2772 0.2042 sre_compile:38(_compile)
> +5005 0.2772 0.2042 +sre_compile:38(_compile)
And called the compiler 5000 times? That seems odd.
> +67060 0.0316 0.0316 +<method 'append' of 'list' objects>
> +32898 0.0288 0.0288 +sre_parse:136(__getitem__)
> +13746 0.0069 0.0069 +<len>
> +643 0.0056 0.0042 +sre_compile:360(_simple)
> 16761 0.1891 0.1891 <posix.lstat>
> 33525 0.2433 0.1776 posixpath:56(join)
> +33525 0.0374 0.0374 +<method 'startswith' of 'str'
> objects>
> +33520 0.0283 0.0283 +<method 'endswith' of 'str'
> objects>
> 1447 0.1387 0.1387 <posix.listdir>
> 32319 0.1595 0.1282 sre_parse:188(__next)
> +64635 0.0312 0.0312 +<len>
> 5649 0.1268 0.1087 sre_parse:146(getwidth)
> +5005 0.1259 0.1078 +sre_parse:146(getwidth)
> +13127 0.0156 0.0156 +<min>
> +3115 0.0025 0.0025 +<max>
>
> Empty repo
>
> CallCount Total(s) Inline(s) module:lineno(function)
> 22245 0.2563 0.2084 re:196(escape)
Hmmm, escaping is constant. Looks like the bits in util.globre.
> +22245 0.0229 0.0229 +<range>
> +22245 0.0125 0.0125 +<len>
> +22245 0.0124 0.0124 +<method 'join' of 'str' objects>
> 5006 0.2793 0.2052 sre_compile:38(_compile)
And compile count is constant. So why 5000 calls?
> Mercurial with 'style:plain' patch:
>
> Full repo
>
> CallCount Total(s) Inline(s) module:lineno(function)
> 1 1.3375 0.2598 mercurial.dirstate:387(findfiles)
> +16760 0.1875 0.1875 +<posix.lstat>
> +33521 0.2523 0.1864 +posixpath:56(join)
> +1447 0.0528 0.0528 +<posix.listdir>
> +15292 0.2977 0.0509 +mercurial.dirstate:359(imatch)
> +1468 0.0973 0.0453 +mercurial.util:480(<lambda>)
> 16761 0.1875 0.1875 <posix.lstat>
> 33525 0.2524 0.1864 posixpath:56(join)
> +33525 0.0375 0.0375 +<method 'startswith' of 'str'
> objects>
> +33520 0.0284 0.0284 +<method 'endswith' of 'str'
> objects>
> 5874 0.3354 0.1684 mercurial.util:480(<lambda>)
There's our 5874 again.
> +5293 0.1671 0.1671 +<built-in method match>
> 5293 0.1671 0.1671 <built-in method match>
> 1 1.7405 0.1035 mercurial.dirstate:477(status)
> +11551 1.5218 0.0404 +mercurial.dirstate:335(statwalk)
> +664 0.0003 0.0003 +<method 'append' of 'list' objects>
> +1 0.1149 0.0001 +mercurial.dirstate:27(__getattr__)
> 1 0.1147 0.0758 mercurial.dirstate:124(_read)
> +10886 0.0296 0.0196 +struct:77(unpack)
> +10887 0.0051 0.0051 +<len>
> +1 0.0031 0.0031 +<method 'read' of 'file' objects>
> +1 0.0011 0.0000
> +mercurial.demandimport:71(__getattribute
> __)
> +1 0.0000 0.0000 +mercurial.util:1334(__call__)
> 1447 0.0528 0.0528 <posix.listdir>
> 15292 0.2977 0.0509 mercurial.dirstate:359(imatch)
> +4406 0.2382 0.1230 +mercurial.util:480(<lambda>)
> +11550 0.0086 0.0086 +mercurial.util:252(always)
> 1458 0.0472 0.0472 <method 'sort' of 'list' objects>
>
> Empty repo
>
> CallCount Total(s) Inline(s) module:lineno(function)
> 626 0.0212 0.0199 mercurial.ignore:11(_parselines)
> +628 0.0006 0.0006 +<method 'endswith' of 'str'
> objects>
> +628 0.0004 0.0004 +<len>
> +628 0.0003 0.0003 +<method 'rstrip' of 'str' objects>
> 625 0.0199 0.0155 posixpath:373(normpath)
> +3580 0.0017 0.0017 +<method 'append' of 'list' objects>
> +625 0.0011 0.0011 +<method 'split' of 'str' objects>
> +625 0.0010 0.0010 +<method 'join' of 'str' objects>
> +625 0.0007 0.0007 +<method 'startswith' of 'str'
> objects>
> 1 0.1103 0.0097 mercurial.ignore:24(ignore)
> +626 0.0212 0.0199 +mercurial.ignore:11(_parselines)
> +5609 0.0039 0.0039 +<method 'startswith' of 'str'
> objects>
> +623 0.0006 0.0006 +<method 'items' of 'dict' objects>
> +623 0.0003 0.0003 +<method 'append' of 'list' objects>
> +1 0.0745 0.0001 +mercurial.util:406(matcher)
> 246 0.0097 0.0072 sre_compile:38(_compile)
> +245 0.0097 0.0072 +sre_compile:38(_compile)
> +1910 0.0009 0.0009 +<method 'append' of 'list' objects>
> +918 0.0009 0.0009 +sre_parse:136(__getitem__)
> +656 0.0003 0.0003 +<len>
> +48 0.0004 0.0003 +sre_compile:360(_simple)
> 141 0.0198 0.0066 sre_parse:385(_parse)
> +57 0.0197 0.0020 +sre_parse:307(_parse_sub)
> +658 0.0046 0.0013 +sre_parse:207(get)
> +569 0.0013 0.0010 +sre_parse:144(append)
> +316 0.0014 0.0005 +sre_parse:201(match)
> +65 0.0006 0.0005 +sre_parse:263(_escape)
> 8141 0.0062 0.0062 <method 'startswith' of 'str'
> objects>
> 2 0.0319 0.0045 mercurial.util:510(normalizepats)
> +623 0.0199 0.0155 +posixpath:373(normpath)
> +623 0.0069 0.0039 +mercurial.util:271(patkind)
> +1246 0.0006 0.0006 +<method 'append' of 'list' objects>
> 294 0.0050 0.0041 sre_parse:146(getwidth)
> +245 0.0049 0.0041 +sre_parse:146(getwidth)
> +632 0.0008 0.0008 +<min>
> +140 0.0001 0.0001 +<max>
> 8175 0.0041 0.0041 <method 'append' of 'list' objects>
> 623 0.0069 0.0039 mercurial.util:271(patkind)
> +1897 0.0016 0.0016 +<method 'startswith' of 'str'
> objects>
> +623 0.0013 0.0013 +<method 'split' of 'str' objects>
Something strikes me as odd here. I wrote a quick little test:
----
import re, random, sys
def randhexstring():
return "".join(["%04x" % random.randint(0, 0xffff) for x in range(20)])
pats = [randhexstring() for x in range(1000)]
def rematch(pats):
pat = '(?:%s)' % '|'.join(pats)
return re.compile(pat).match
def fixmatch(pats):
def match(s):
return s in pats
return match
if "re" in sys.argv:
m = rematch(pats)
else:
m = fixmatch(pats)
if "exit" in sys.argv:
sys.exit(0)
count = 0
for s in pats * 100:
if m(s):
count += 1
print count
----
So we generate 1000 80-character strings, and either compile them into
a regex or just do a straight match, then we match 100000 times.
Building the table takes:
$ time python retest.py exit
real 0m0.260s
user 0m0.248s
sys 0m0.004s
Compiling the giant regex takes:
$ time python2.5 retest.py re exit
real 0m0.806s
user 0m0.796s
sys 0m0.008s
And matching takes:
$ time python2.5 retest.py re
100000
real 0m1.851s
user 0m1.832s
sys 0m0.012s
And doing straight string matching takes:
$ time python2.5 retest.py
100000
real 0m2.464s
user 0m2.452s
sys 0m0.004s
That's ~1s for regex matching vs ~2.2s for string matching once we
take out building the patterns and compiling. Which is as expected - a
regex can test an input against all the possible ignore strings in one
pass through its DFA, aborting as soon as it's impossible for the
input to match.
On the other hand, if your regex is too large for your Python build
and it has to get broken into pieces, then regexes will probably lose.
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list