[PATCH 7 of 7 v2] bdiff: give slight preference to removing trailing lines

Jun Wu quark at fb.com
Fri Nov 25 10:07:24 EST 2016


Excerpts from Mads Kiilerich's message of 2016-11-25 15:58:28 +0100:
> On 11/24/2016 06:52 PM, Jun Wu wrote:
> > Excerpts from Augie Fackler's message of 2016-11-17 12:42:26 -0500:
> >> My own cursory perfbdiff runs suggest this is a perf wash (using
> >> `perfbdiff -m 3041e4d59df2` in the mozilla repo). Queued. Thanks!
> > I'd mention this series changes the behavior of the diff output. The
> > difference was caught by fastannotate test.
> >
> > See the below table (old: e1d6aa0e4c3a, new: 8836f13e3c5b):
> >
> >     a | b | old | new
> >    --------------------
> >     a | a |  a  | -a
> >     a | z | +z  |  a
> >     a | a |  a  | +z
> >       |   | -a  |  a
> >    --------------------
> >     a | a |     a
> >     a | a |     a
> >     a |   |    -a
> >
> > I think we would always prefer putting deletions at the end, to be consistent.
> 
> 
> I agree that this end result would be nice.
> 
> (My patches "bdiff: early pruning of common suffix before doing 
> expensive computations" and "bdiff: early pruning of common prefix 
> before doing expensive computations" happens to give the obvious "good" 
> result in this case ... but is no general solution to the problem.)

If you have no easy solution for this. I'd suggest ignore it for now and
probably postpone other ideas about bdiff. I also have some "directional"
ideas on diff algrotihm. I'll try to start the discussion in a few weeks.

> Stating the obvious context:
> The algorithms and tweaks for making "good" diffs are just heuristics 
> and can never be perfect. A counter example doesn't necessarily prove 
> anything. But also, it might be worth it to add a tweak...
> 
> An idea, described informally: While the current recursive algorithm 
> works "best" by selecting middle-most longest common substrings, I guess 
> it could make sense to have a post processing step that shifts common 
> substrings to the first occurrence in the previous diff chunk. Also, if 
> a previous or later non-common range is empty (because it is an 
> add/remove), the matching range can be "shifted", perhaps with 
> preference of not starting with an "empty" line but prefering the lowest 
> amount of leading spaces.
> 
> /Mads


More information about the Mercurial-devel mailing list