[PATCH 1 of 2 resend] keyword: compile regexes on demand
Mads Kiilerich
mads at kiilerich.com
Thu Nov 4 11:38:10 CDT 2010
On 11/04/2010 04:55 PM, Martin Geisler wrote:
> Christian Ebert<blacktrash at gmx.net> writes:
>
>> # HG changeset patch
>> # User Christian Ebert<blacktrash at gmx.net>
>> # Date 1288791461 -3600
>> # Node ID 2ce1ff53e29f4b775ed550c13beb42da3942523e
>> # Parent 0e0a52bd58f941c00b2a1d57f23676fa486e58c3
>> keyword: compile regexes on demand
>
> Are you sure this is faster? I tried to see how long the old code took
> and here it's very fast:
>
> % python -m timeit \
> -s "import re" \
> -s "escaped = 'RCSfile|Author|Header|Source|Date|RCSFile|Id|Revision'" \
> "kw = re.compile(r'\$(%s)\$' % escaped)" \
> "kwexp = re.compile(r'\$(%s): [^$\n\r]*? \$' % escaped)"
> 100000 loops, best of 3: 2.52 usec per loop
Beware of the caching of compiled expressions inside the re module:
$ python -m timeit \
> -s "import re" \
> -s "escaped = 'RCS|Aut|Hea|Sou|Dat|RCS|Id|Rev'" \
> "kw = re.compile(r'\$(%s)\$' % escaped)" \
> "kwexp = re.compile(r'\$(%s): [^$\n\r]*? \$' % escaped)"
100000 loops, best of 3: 8.16 usec per loop
$ python -m timeit \
> -s "import re" \
> -s "escaped = 'RCS|Aut|Hea|Sou|Dat|RCS|Id|Rev'" \
> "re.purge()"
1000000 loops, best of 3: 1.17 usec per loop
$ python -m timeit \
> -s "import re" \
> -s "escaped = 'RCS|Aut|Hea|Sou|Dat|RCS|Id|Rev'" \
> "re.purge()" \
> "kw = re.compile(r'\$(%s)\$' % escaped)" \
> "kwexp = re.compile(r'\$(%s): [^$\n\r]*? \$' % escaped)"
1000 loops, best of 3: 1.93 msec per loop
These numbers are so much higher that they might justify the change.
I'm not sure if we should rely on the re cache or always should
pre-compile everywhere, but unnecessary compilation is unfortunate. A
new general util function or pattern could perhaps be nice.
/Mads
More information about the Mercurial-devel
mailing list