[patch] syntax:plain for .hgignore

Johannes Hofmann Johannes.Hofmann at gmx.de
Mon Sep 10 15:58:59 CDT 2007


On Mon, Sep 10, 2007 at 04:01:43PM -0400, Jonathan S. Shapiro wrote:
> On Mon, 2007-09-10 at 21:32 +0200, Johannes Hofmann wrote:
> > However I agree that the performance optimization should better be
> > done behind the scenes without adding new syntax options for
> > .hgignore. Does anyone know an easy/robust way to check whether a
> > string contains special regexp syntax or not?
> 
> Depends on the prevailing regexp syntax. For glob syntax, the special
> characters are:
> 
>   *, ?, [, ]  \
> 
> Depending on how anchored globs were handled you may also need to check
> for ^ and $. Rules:
> 
>   1. If none of these characters appear, it is just a string.
>   2. If any of these characters appear preceded by a backslash, it
>      is just a string.
> 
> For regexp, you can look up the magic characters, but it's the same
> idea.

Hm, checking for these characters sounds quite ugly and error prone.
Especially the handling of escaped characters.

> 
> However, I am concerned about something. ThomasAH and I have been
> discussing include/exclude mechanisms. This requires that the entries be
> processed in order, and I think if that is done the whole thing must be
> compiled to a regexp because it is no longer just a union of patterns.
> 
> So two questions:
> 
> 1. Is the performance gain so compelling that it justfies the added
>    complexity?
> 
> 2. Is it really faster? If the RE is built correctly it really shouldn't
>    be that much faster.

My patch speeds up "hg status" from 2.4s to about 1s in our special
case. So its quite noticable.

> 
> 3. Is it worth it at all if we will need to remove it later?
> 
> I suspect that the cost you are seeing lies in *compiling* the RE rather
> than executing the RE. If this is the case, there may be a better
> solution. Which cost are you actually concerned about?

It may well be the compilation time of the huge regexpression that
is built from the 500 line .hgignore.
A quick test with an empty repository that just contains a 500 line
.hgignore file shows:

hofmann at blob:/tmp/hgignoretest >time hg st
? .hgignore

real    0m1.030s
user    0m0.984s
sys     0m0.023s

But I will check again on our real world example tomorrow.

Would it be possible to cache the compiled regexpression somewhere?

 Johannes



More information about the Mercurial-devel mailing list