[PATCH 3 of 3] highlight: add option to prevent content-only based fallback

Augie Fackler raf at durin42.com
Thu Oct 15 08:02:44 CDT 2015


On Thu, Oct 15, 2015 at 02:29:22PM +0800, Anton Shestakov wrote:
> 15.10.2015, 09:25, "Gregory Szorc" <gregory.szorc at gmail.com>:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc at gmail.com>
> > # Date 1444872136 25200
> > # Wed Oct 14 18:22:16 2015 -0700
> > # Node ID a55c6e623cb63e6ac2e4f074aff8b767ab8fc50e
> > # Parent bf9868e78cdfa8acb4a9a035bc21d49260043f5c
> > highlight: add option to prevent content-only based fallback
>
> LGTM.

queued, thanks

>
> > When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly
> > colorized files. Upon further investigation, the hightlight extension
> > is first attempting a filename+content based match then falling back to a
> > purely content-driven detection mode in Pygments. Sounds good in theory.
> >
> > Unfortunately, Pygments' content-driven detection establishes no minimum
> > threshold for returning a lexer. Furthermore, the detection code for
> > a number of languages is very liberal. For example, ActionScript 3 will
> > return a confidence of 0.3 (out of 1.0) if the first 1k of the file
> > we pass in matches the regex "\w+\s*:\s*\w"! Python matches on
> > "import ". It's no coincidence that a number of our extension-less files
> > were getting highlighted improperly.
>
> It's a shame that Pygments don't allow configuring minimum confidence level inside guess_lexer, which could (again, in theory) be a better option than to disable guessing purely by content altogether. But yeah, PythonLexer.analyse_text() does give out 100% confidence if there's an 'import ' somewhere in the first 1000 bytes. Wow.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel


More information about the Mercurial-devel mailing list