[PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range

Mads Kiilerich mads at kiilerich.com
Tue Nov 15 16:06:29 EST 2016


On 11/12/2016 03:58 PM, Jun Wu wrote:
> Excerpts from Mads Kiilerich's message of 2016-11-03 22:34:15 +0100:
>> # HG changeset patch
>> # User Mads Kiilerich <madski at unity3d.com>
>> # Date 1478208837 -3600
>> #      Thu Nov 03 22:33:57 2016 +0100
>> # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1
>> # Parent  3e0216b2a0995cb21946bc13fb21391013332c57
>> bdiff: make sure we append to repeated lines instead of inserting into range
>>
>> This will mitigate the symptoms that tests exposed in the previous changeset.
>>
>> Arguably, we need similar handling for longer sequences of repeated lines...
> I'm concerned about this. This patch looks like a workaround for the
> simplest case. While you are aware it could have issues with "longer
> sequences of repeated lines". And that's not that uncommon. There are
> real-world JSON files like:
>
>    [
>      ...
>      {
>        "a": "b",
>      },
>      {
>        "a": "b",
>      },
>      {
>        "a": "b",
>      },
>      ...
>    ]
>
> Actually, those kinds of JSON files are part of the motivation of mpm's
> f1ca249696ed. This series would surely affect them, while I'm not sure the
> change is in a good way or not yet.
>
> Note that after f1ca249696ed, there are still unsolved issues with some
> large JSON files - we have seen (from user report) diff output being
> "wrong", where human could generate much smaller diff easily.
>
> If you are not in a hurry, I'd like to learn this area deeper before
> commenting confidently. That may take me 1 to 2 weeks.

The test case I added was made to cover the same problem as your JSON 
example. I think v2 of my bdiff patch series addresses the concerns you 
mention in a reasonable way.

Sure, I would appreciate to get more thorough review and feedback and 
test cases. I page in and out and have no hurry ... except that Greg is 
making rapid progress in the same code ;-)

/Mads


More information about the Mercurial-devel mailing list