[patch] syntax:plain for .hgignore
Johannes Hofmann
Johannes.Hofmann at gmx.de
Mon Sep 10 15:58:59 CDT 2007
On Mon, Sep 10, 2007 at 04:01:43PM -0400, Jonathan S. Shapiro wrote:
> On Mon, 2007-09-10 at 21:32 +0200, Johannes Hofmann wrote:
> > However I agree that the performance optimization should better be
> > done behind the scenes without adding new syntax options for
> > .hgignore. Does anyone know an easy/robust way to check whether a
> > string contains special regexp syntax or not?
>
> Depends on the prevailing regexp syntax. For glob syntax, the special
> characters are:
>
> *, ?, [, ] \
>
> Depending on how anchored globs were handled you may also need to check
> for ^ and $. Rules:
>
> 1. If none of these characters appear, it is just a string.
> 2. If any of these characters appear preceded by a backslash, it
> is just a string.
>
> For regexp, you can look up the magic characters, but it's the same
> idea.
Hm, checking for these characters sounds quite ugly and error prone.
Especially the handling of escaped characters.
>
> However, I am concerned about something. ThomasAH and I have been
> discussing include/exclude mechanisms. This requires that the entries be
> processed in order, and I think if that is done the whole thing must be
> compiled to a regexp because it is no longer just a union of patterns.
>
> So two questions:
>
> 1. Is the performance gain so compelling that it justfies the added
> complexity?
>
> 2. Is it really faster? If the RE is built correctly it really shouldn't
> be that much faster.
My patch speeds up "hg status" from 2.4s to about 1s in our special
case. So its quite noticable.
>
> 3. Is it worth it at all if we will need to remove it later?
>
> I suspect that the cost you are seeing lies in *compiling* the RE rather
> than executing the RE. If this is the case, there may be a better
> solution. Which cost are you actually concerned about?
It may well be the compilation time of the huge regexpression that
is built from the 500 line .hgignore.
A quick test with an empty repository that just contains a 500 line
.hgignore file shows:
hofmann at blob:/tmp/hgignoretest >time hg st
? .hgignore
real 0m1.030s
user 0m0.984s
sys 0m0.023s
But I will check again on our real world example tomorrow.
Would it be possible to cache the compiled regexpression somewhere?
Johannes
More information about the Mercurial-devel
mailing list