[PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range

Jun Wu quark at fb.com
Sat Nov 12 09:58:01 EST 2016


Excerpts from Mads Kiilerich's message of 2016-11-03 22:34:15 +0100:
> # HG changeset patch
> # User Mads Kiilerich <madski at unity3d.com>
> # Date 1478208837 -3600
> #      Thu Nov 03 22:33:57 2016 +0100
> # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1
> # Parent  3e0216b2a0995cb21946bc13fb21391013332c57
> bdiff: make sure we append to repeated lines instead of inserting into range
> 
> This will mitigate the symptoms that tests exposed in the previous changeset.
> 
> Arguably, we need similar handling for longer sequences of repeated lines...

I'm concerned about this. This patch looks like a workaround for the
simplest case. While you are aware it could have issues with "longer
sequences of repeated lines". And that's not that uncommon. There are
real-world JSON files like:

  [
    ...
    {
      "a": "b",
    },
    {
      "a": "b",
    },
    {
      "a": "b",
    },
    ...
  ]

Actually, those kinds of JSON files are part of the motivation of mpm's
f1ca249696ed. This series would surely affect them, while I'm not sure the
change is in a good way or not yet.

Note that after f1ca249696ed, there are still unsolved issues with some
large JSON files - we have seen (from user report) diff output being
"wrong", where human could generate much smaller diff easily.

If you are not in a hurry, I'd like to learn this area deeper before
commenting confidently. That may take me 1 to 2 weeks.

> But also, we already have examples of how the heuristics handle other cases in
> a similar way.
> 
> diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
> --- a/mercurial/bdiff.c
> +++ b/mercurial/bdiff.c
> @@ -187,7 +187,7 @@ static int longest_match(struct bdiff_li
>                  } else if (i == mi && findbetterb) {
>                      /* better j in first upper half */
>                      mj = j;
> -                    if (j <= bhalf)
> +                    if (j <= bhalf && !(j > 0 && k == 1 && b[j - 1].e == b[j].e))
>                          findbetterb = 0;
>                  }
>              }
> diff --git a/tests/test-annotate.t b/tests/test-annotate.t
> --- a/tests/test-annotate.t
> +++ b/tests/test-annotate.t
> @@ -91,8 +91,8 @@ annotate (JSON)
>  annotate -n b
>  
>    $ hg annotate -n b
> +  0: a
>    1: a
> -  0: a
>    1: a
>    3: b4
>    3: b5
> @@ -111,8 +111,8 @@ annotate --no-follow b
>  annotate -nl b
>  
>    $ hg annotate -nl b
> -  1:1: a
>    0:1: a
> +  1:2: a
>    1:3: a
>    3:4: b4
>    3:5: b5
> @@ -121,8 +121,8 @@ annotate -nl b
>  annotate -nf b
>  
>    $ hg annotate -nf b
> +  0 a: a
>    1 a: a
> -  0 a: a
>    1 a: a
>    3 b: b4
>    3 b: b5
> @@ -131,8 +131,8 @@ annotate -nf b
>  annotate -nlf b
>  
>    $ hg annotate -nlf b
> -  1 a:1: a
>    0 a:1: a
> +  1 a:2: a
>    1 a:3: a
>    3 b:4: b4
>    3 b:5: b5
> @@ -156,8 +156,8 @@ annotate -nlf b
>  annotate after merge
>  
>    $ hg annotate -nf b
> +  0 a: a
>    1 a: a
> -  0 a: a
>    1 a: a
>    3 b: b4
>    4 b: c
> @@ -166,8 +166,8 @@ annotate after merge
>  annotate after merge with -l
>  
>    $ hg annotate -nlf b
> -  1 a:1: a
>    0 a:1: a
> +  1 a:2: a
>    1 a:3: a
>    3 b:4: b4
>    4 b:5: c
> @@ -198,7 +198,7 @@ annotate after merge with -l
>  annotate after rename merge
>  
>    $ hg annotate -nf b
> -  1 a: a
> +  0 a: a
>    6 b: z
>    1 a: a
>    3 b: b4
> @@ -209,7 +209,7 @@ annotate after rename merge
>  annotate after rename merge with -l
>  
>    $ hg annotate -nlf b
> -  1 a:1: a
> +  0 a:1: a
>    6 b:2: z
>    1 a:3: a
>    3 b:4: b4
> @@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit
>    $ echo more >> b
>    $ hg ci -mmore -d '7 0'
>    $ hg annotate -nlf b
> -   1 a: 1: a
> +   0 a: 1: a
>     6 b: 2: z
>     1 a: 3: a
>     3 b: 4: b4
> @@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit
>  linkrev vs rev
>  
>    $ hg annotate -r tip -n a
> +  0: a
>    1: a
> -  0: a
>    1: a
>  
>  linkrev vs rev with -l
>  
>    $ hg annotate -r tip -nl a
> -  1:1: a
>    0:1: a
> +  1:2: a
>    1:3: a
>  
>  Issue589: "undelete" sequence leads to crash
> diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t
> --- a/tests/test-bhalf.t
> +++ b/tests/test-bhalf.t
> @@ -105,8 +105,8 @@ Explore some bdiff implementation edge c
>    --- a/x
>    +++ b/x
>    @@ -1,1 +1,3 @@
> +   a
>    +a
> -   a
>    +a
>    diff --git a/y b/y
>    --- a/y
> diff --git a/tests/test-commit-amend.t b/tests/test-commit-amend.t
> --- a/tests/test-commit-amend.t
> +++ b/tests/test-commit-amend.t
> @@ -47,8 +47,8 @@ Amending changeset with changes in worki
>    --- a/a    Thu Jan 01 00:00:00 1970 +0000
>    +++ b/a    Thu Jan 01 00:00:00 1970 +0000
>    @@ -1,1 +1,3 @@
> +   a
>    +a
> -   a
>    +a
>    $ hg log
>    changeset:   1:43f1ba15f28a
> @@ -122,13 +122,13 @@ No changes, just a different message:
>    uncompressed size of bundle content:
>         254 (changelog)
>         163 (manifests)
> -       141  a
> +       129  a
>    saved backup bundle to $TESTTMP/.hg/strip-backup/74609c7f506e-1bfde511-amend-backup.hg (glob)
>    1 changesets found
>    uncompressed size of bundle content:
>         250 (changelog)
>         163 (manifests)
> -       141  a
> +       129  a
>    adding branch
>    adding changesets
>    adding manifests
> @@ -140,8 +140,8 @@ No changes, just a different message:
>    --- a/a    Thu Jan 01 00:00:00 1970 +0000
>    +++ b/a    Thu Jan 01 00:00:00 1970 +0000
>    @@ -1,1 +1,3 @@
> +   a
>    +a
> -   a
>    +a
>    $ hg log
>    changeset:   1:1cd866679df8
> @@ -266,13 +266,13 @@ then, test editing custom commit message
>    uncompressed size of bundle content:
>         249 (changelog)
>         163 (manifests)
> -       143  a
> +       131  a
>    saved backup bundle to $TESTTMP/.hg/strip-backup/5f357c7560ab-e7c84ade-amend-backup.hg (glob)
>    1 changesets found
>    uncompressed size of bundle content:
>         257 (changelog)
>         163 (manifests)
> -       143  a
> +       131  a
>    adding branch
>    adding changesets
>    adding manifests
> @@ -309,13 +309,13 @@ Same, but with changes in working dir (d
>    uncompressed size of bundle content:
>         464 (changelog)
>         322 (manifests)
> -       261  a
> +       249  a
>    saved backup bundle to $TESTTMP/.hg/strip-backup/7ab3bf440b54-8e3b5088-amend-backup.hg (glob)
>    1 changesets found
>    uncompressed size of bundle content:
>         257 (changelog)
>         163 (manifests)
> -       145  a
> +       133  a
>    adding branch
>    adding changesets
>    adding manifests


More information about the Mercurial-devel mailing list