[patch] syntax:plain for .hgignore

Mon Sep 10 17:24:57 CDT 2007

On Mon, 10 Sep 2007, Jonathan S. Shapiro wrote:

> On Mon, 2007-09-10 at 23:34 +0200, Guido Ostkamp wrote:
>
>> We don't want to sit around waiting several seconds for an expensive 
>> regular expression engine run that is invoked unnecessarily.
>
> Until we know whether the expense is the execution or the compilation, 
> we don't know which problem to solve, and there is a real possibility 
> that this will all need to get ripped out to support include/exclude 
> later -- at which time it will be perceived as a regression.

you requested some numbers, so here they are:

.hgignore has ~620 lines, A small number ~25 are real glob patterns, the 
rest could be handled as plain text search.

Number of changesets        : 54969
Number of files in .hg      : 12381
Number of files working copy: 19813
Size of .hg          (du -s): 504752
Size of working copy (du -s): 1102980

With original Mercurial (no 'syntax plain' support, Python 2.5.x) we see 
the following:

Called on an empty repository:

$ time hg sta

real    0m0.905s
user    0m0.883s
sys     0m0.023s

Called on a filled repository:

$ time hg sta > /dev/null

real    0m3.601s
user    0m3.385s
sys     0m0.216s

With a patched Mercurial with 'syntax: plain' support (Python 2.4.x) we 
see:

Called on an empty repository:

$ time hg sta > /dev/null

real    0m0.131s
user    0m0.110s
sys     0m0.021s

Called on a filled repository:

$ time hg sta > /dev/null

real    0m1.308s
user    0m1.098s
sys     0m0.209s

>> And we don't want to waste valuable CPU power which is needed by other 
>> users - we are working on servers here, not on personal desktop 
>> systems.
>
> I don't mean to sound obnoxious here, but if CPU is an issue, you need a 
> better server. Have you actually looked that the CPU utilization on a 
> modern server? It usually isn't very high. On most servers, the disk is 
> the real bottleneck.

I would love to work with the best hardware money can buy, but in the real 
world. one has to live with the stuff the financial officers of the 
company grants us.

> I begin to suspect that we may not be building the RE correctly. In 
> abstract, the RE processing should be completely overwhelmed by the seek 
> and stat overheads here.

Based on the above, I don't think so. Please let us know your opinion.

Regards

Guido