[PATCH] highlight: do not use guess_lexer functions. they use too much CPU time for certain inputs

Brendan Cully brendan at kublai.com
Thu Jun 12 02:13:35 CDT 2008


I finally pushed the 1K cap.

On Thursday, 03 April 2008 at 16:49, Brendan Cully wrote:
> yikes. How about we cap it at 1K? That'll probably work 95% of the
> time, no?
> 
> On Wednesday, 02 April 2008 at 23:53, Ralf Schmitt wrote:
> > 
> > 
> > On Wed, Apr 2, 2008 at 10:45 PM, Matt Mackall <mpm at selenic.com> wrote:
> > 
> > 
> >     On Wed, 2008-04-02 at 21:59 +0200, Ralf Schmitt wrote:
> >     > # HG changeset patch
> >     > # User ralf at brainbot.com
> >     > # Date 1207165818 -7200
> >     > # Node ID 50015149baa0dbf1b7066f0356b65f492ed78450
> >     > # Parent  101526031d06d184559ae797687e50661b96156e
> >     > highlight: do not use guess_lexer functions. they use too much CPU time
> >     for certain inputs.
> > 
> >     Does certain input mean big inputs? Can we send some truncated source to
> >     the guesser instead?
> > 
> >  
> > I reported this some time ago:
> > http://selenic.com/pipermail/mercurial/2008-March/018029.html
> > The file where this happened for me is a php file with around 2000 lines
> > (140k).
> > 
> > I wrote a short script to measure the time it takes to run
> > guess_lexer_for_filename on truncated input:
> > from pygments.lexers import guess_lexer_for_filename
> > 
> > text=open("Collection.i18n.php").read()
> > 
> > import time
> > size=512
> > while 1:
> >     stime=time.time()
> >     for run in range(10):
> >         guess_lexer_for_filename("collection.i18n.php", text[:size], encoding=
> > "utf-8")
> >     print (time.time()-stime)/10, size
> >    
> >     size+=512
> > 
> > 
> > It prints the following values (first row is time needed in seconds, second row
> > is size in bytes):
> > 
> > 0.00721120834351 512
> > 0.00744049549103 1024
> > 0.0433429002762 1536
> > 0.161764788628 2048
> > 0.34955329895 2560
> > 0.627179193497 3072
> > 0.958257818222 3584
> > 1.46866378784 4096
> > 2.11897850037 4608
> > 2.94355890751 5120
> > 3.93533871174 5632
> > 5.09589328766 6144
> > 
> > This is on a 2.4 Ghz CPU.
> > 
> > Regards,
> > - Ralf
> > 
> 
> > _______________________________________________
> > Mercurial mailing list
> > Mercurial at selenic.com
> > http://selenic.com/mailman/listinfo/mercurial
> 
> _______________________________________________
> Mercurial mailing list
> Mercurial at selenic.com
> http://selenic.com/mailman/listinfo/mercurial


More information about the Mercurial mailing list