[issue1814] bdiff.c: 4 is too low for popular-line threshold

Jason Orendorff mercurial-bugs at selenic.com
Wed Aug 26 17:31:34 UTC 2009


New submission from Jason Orendorff <jorendorff at mozilla.com>:

In mercurial/bdiff.c:
>	/* compute popularity threshold */
>	t = (bn >= 4000) ? bn / 1000 : bn + 1;

The lower the threshold, the stronger the popularity hack's
influence. So at 3999 lines, the hack is disabled; and at 4000 lines,
the hack is enabled at maximum strength (t=4).

No source file in mercurial/crew is over 4000 lines. But there are, oh,
a few such files in Mozilla.  I can testify that this hack causes hg to
generate some correct but eyebrow-raising patches.

I think the hack should phase in gradually. The threshold should be high
for small files where we don't need it so much.  Like this:

        t = (bn < 31000) ? 1000000 / bn : bn / 1000;

That would leave the popularity hack disabled for small files, then
gradually phase it in:

    bn <   1000   --   t > bn    (popularity hack is completely disabled)
    bn ==  1000   --   t = 1000  (still effectively disabled)
    bn ==  2000   --   t =  500  (only hits unusual files)
    bn == 10000   --   t =  100  (only hits especially common lines)
    bn == 31000   --   t =   31  (hack is at maximum power)
    bn == 32000   --   t =   32  (hack could backfire, ease off)

If I *completely* disable the popularity hack by changing that line to
`t = bn + 1;`, hg becomes 20% slower on a large (~10sec) qrefresh, and
the diffs really are better for human consumption.

----------
messages: 10425
nosy: jorendorff
priority: bug
status: unread
title: bdiff.c: 4 is too low for popular-line threshold

____________________________________________________
Mercurial issue tracker <mercurial-bugs at selenic.com>
<http://mercurial.selenic.com/bts/issue1814>
____________________________________________________



More information about the Mercurial-devel mailing list