[PATCH 5 of 6 V2] match: avoid translating glob to matcher multiple times for large sets

Fri Nov 23 21:13:16 EST 2018

On Fri, 23 Nov 2018 15:51:58 -0800, Martin von Zweigbergk wrote:
> On Fri, Nov 23, 2018 at 9:20 AM Boris FELD <boris.feld at octobus.net> wrote:
> > So I feel like it is fine to just rely on the size limit.
> > >> Perhaps it's been fixed since 2.7.4. The regexp code width is extended
> > >> from 16bit to 32bit (or Py_UCS4) integer. That should be large enough to
> > >> handle practical patterns.
> > >>
> > >> https://bugs.python.org/issue1160
> >
> > Thanks for digging this out. It looks like we may be able to drop this
> > limit altogether. However, I would like to make it a change distinct
> > from this series.
> >
> > The current code is very problematic for some people (to the point where
> > the majority of `hg status` time is spent in that function). I would
> > like to get fast code for the same semantic first. Then look into
> > changing the semantic.
> >
> 
> Is your concern that you might regress in performance of something by
> changing how large the groups are? Or that it would be more work?
> 
> I tried creating a regex for *every* pattern and that actually seemed
> faster (to my surprise), both when creating the matcher and when evaluating
> it. I tried it on the mozilla-unified repo both with 1k files and with 10k
> files in the hgignores. I used the following patch on top of your series.

Wow. If we don't need to combine patterns into one, numbered groups should
just work.