[patch] syntax:plain for .hgignore

Tue Sep 11 05:23:49 CDT 2007

> On Tue, 2007-09-11 at 00:24 +0200, Guido Ostkamp wrote:
> Unfortunately, these numbers don't tell us anything because they are
> running against different python implementations (and possibly different
> RE implementations). Is there any chance to measure this running on the
> same version of python?

I've done it, but the results don't differ much (I called the commands
several times). Now all results are with Python 2.5.1:

Original mercurial:

Full repo

$ time hg sta > /dev/null

real    0m3.608s
user    0m3.399s
sys     0m0.198s

Empty repo

$ time hg sta > /dev/null

real    0m0.915s
user    0m0.881s
sys     0m0.034s

Mercurial with 'style:plain' patch:

Full repo

$ time hg sta > /dev/null

real    0m1.415s
user    0m1.202s
sys     0m0.213s

Empty repo

$ time hg sta > /dev/null

real    0m0.145s
user    0m0.126s
sys     0m0.019s

> Without seeing your .hgignore file, I can only speculate, but here is my
> suspicion about it: (1) the .hgignore entries have many long common
> prefixes, such as:
>
>    foo/bar/baz/bletch/file1.c
>    foo/bar/baz/bletch/file2.c

For the 'style: plain' like patterns, we have the following distribution
of entries (all with full pathname):

strlen   #count
21        1
22        5
23       10
24        7
25       15
26       12
27       23
28       15
29       17
30       22
31       14
32       23
33       15
34       20
35       83
36       57
37       57
38       50
39       41
40       14
41       13
42        9
43       12
44        6
45        4
46        1
47        2
48        1
50        3
51        3
52        3
55        2
57        1
58        1
59        4
61        7
63        4
64        1
66        1
67        2
72        4
73        1
74        1
75        4
76        1
81        1
84        1
87        1

> Ultimately, however, you still have not addressed my main concern with
> this, which is that we are going to have to undo this when
> include/exclude is done. Even if the benefit here is real, it does not
> seem worthwhile if we just going to have to throw it out in the next
> release. I would rather try to find a way to make the RE mechanism
> faster here.

I don't understand why include/exclude stuff should be an issue here.
If you introduce that, you will have to process the patterns 
separately one after the other because order matters, correct?

In this case you will no longer have one large check but a lot
of small checks which you need to go through.

Why would you want to not support a faster algorithm for such a
check?

Regards

Guido