[PATCH 7 of 7 v2] bdiff: give slight preference to removing trailing lines

Mads Kiilerich mads at kiilerich.com
Fri Nov 25 09:58:28 EST 2016


On 11/24/2016 06:52 PM, Jun Wu wrote:
> Excerpts from Augie Fackler's message of 2016-11-17 12:42:26 -0500:
>> My own cursory perfbdiff runs suggest this is a perf wash (using
>> `perfbdiff -m 3041e4d59df2` in the mozilla repo). Queued. Thanks!
> I'd mention this series changes the behavior of the diff output. The
> difference was caught by fastannotate test.
>
> See the below table (old: e1d6aa0e4c3a, new: 8836f13e3c5b):
>
>     a | b | old | new
>    --------------------
>     a | a |  a  | -a
>     a | z | +z  |  a
>     a | a |  a  | +z
>       |   | -a  |  a
>    --------------------
>     a | a |     a
>     a | a |     a
>     a |   |    -a
>
> I think we would always prefer putting deletions at the end, to be consistent.


I agree that this end result would be nice.

(My patches "bdiff: early pruning of common suffix before doing 
expensive computations" and "bdiff: early pruning of common prefix 
before doing expensive computations" happens to give the obvious "good" 
result in this case ... but is no general solution to the problem.)

Stating the obvious context:
The algorithms and tweaks for making "good" diffs are just heuristics 
and can never be perfect. A counter example doesn't necessarily prove 
anything. But also, it might be worth it to add a tweak...

An idea, described informally: While the current recursive algorithm 
works "best" by selecting middle-most longest common substrings, I guess 
it could make sense to have a post processing step that shifts common 
substrings to the first occurrence in the previous diff chunk. Also, if 
a previous or later non-common range is empty (because it is an 
add/remove), the matching range can be "shifted", perhaps with 
preference of not starting with an "empty" line but prefering the lowest 
amount of leading spaces.

/Mads



More information about the Mercurial-devel mailing list