[PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range
Jun Wu
quark at fb.com
Sat Nov 12 09:58:01 EST 2016
Excerpts from Mads Kiilerich's message of 2016-11-03 22:34:15 +0100:
> # HG changeset patch
> # User Mads Kiilerich <madski at unity3d.com>
> # Date 1478208837 -3600
> # Thu Nov 03 22:33:57 2016 +0100
> # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1
> # Parent 3e0216b2a0995cb21946bc13fb21391013332c57
> bdiff: make sure we append to repeated lines instead of inserting into range
>
> This will mitigate the symptoms that tests exposed in the previous changeset.
>
> Arguably, we need similar handling for longer sequences of repeated lines...
I'm concerned about this. This patch looks like a workaround for the
simplest case. While you are aware it could have issues with "longer
sequences of repeated lines". And that's not that uncommon. There are
real-world JSON files like:
[
...
{
"a": "b",
},
{
"a": "b",
},
{
"a": "b",
},
...
]
Actually, those kinds of JSON files are part of the motivation of mpm's
f1ca249696ed. This series would surely affect them, while I'm not sure the
change is in a good way or not yet.
Note that after f1ca249696ed, there are still unsolved issues with some
large JSON files - we have seen (from user report) diff output being
"wrong", where human could generate much smaller diff easily.
If you are not in a hurry, I'd like to learn this area deeper before
commenting confidently. That may take me 1 to 2 weeks.
> But also, we already have examples of how the heuristics handle other cases in
> a similar way.
>
> diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
> --- a/mercurial/bdiff.c
> +++ b/mercurial/bdiff.c
> @@ -187,7 +187,7 @@ static int longest_match(struct bdiff_li
> } else if (i == mi && findbetterb) {
> /* better j in first upper half */
> mj = j;
> - if (j <= bhalf)
> + if (j <= bhalf && !(j > 0 && k == 1 && b[j - 1].e == b[j].e))
> findbetterb = 0;
> }
> }
> diff --git a/tests/test-annotate.t b/tests/test-annotate.t
> --- a/tests/test-annotate.t
> +++ b/tests/test-annotate.t
> @@ -91,8 +91,8 @@ annotate (JSON)
> annotate -n b
>
> $ hg annotate -n b
> + 0: a
> 1: a
> - 0: a
> 1: a
> 3: b4
> 3: b5
> @@ -111,8 +111,8 @@ annotate --no-follow b
> annotate -nl b
>
> $ hg annotate -nl b
> - 1:1: a
> 0:1: a
> + 1:2: a
> 1:3: a
> 3:4: b4
> 3:5: b5
> @@ -121,8 +121,8 @@ annotate -nl b
> annotate -nf b
>
> $ hg annotate -nf b
> + 0 a: a
> 1 a: a
> - 0 a: a
> 1 a: a
> 3 b: b4
> 3 b: b5
> @@ -131,8 +131,8 @@ annotate -nf b
> annotate -nlf b
>
> $ hg annotate -nlf b
> - 1 a:1: a
> 0 a:1: a
> + 1 a:2: a
> 1 a:3: a
> 3 b:4: b4
> 3 b:5: b5
> @@ -156,8 +156,8 @@ annotate -nlf b
> annotate after merge
>
> $ hg annotate -nf b
> + 0 a: a
> 1 a: a
> - 0 a: a
> 1 a: a
> 3 b: b4
> 4 b: c
> @@ -166,8 +166,8 @@ annotate after merge
> annotate after merge with -l
>
> $ hg annotate -nlf b
> - 1 a:1: a
> 0 a:1: a
> + 1 a:2: a
> 1 a:3: a
> 3 b:4: b4
> 4 b:5: c
> @@ -198,7 +198,7 @@ annotate after merge with -l
> annotate after rename merge
>
> $ hg annotate -nf b
> - 1 a: a
> + 0 a: a
> 6 b: z
> 1 a: a
> 3 b: b4
> @@ -209,7 +209,7 @@ annotate after rename merge
> annotate after rename merge with -l
>
> $ hg annotate -nlf b
> - 1 a:1: a
> + 0 a:1: a
> 6 b:2: z
> 1 a:3: a
> 3 b:4: b4
> @@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit
> $ echo more >> b
> $ hg ci -mmore -d '7 0'
> $ hg annotate -nlf b
> - 1 a: 1: a
> + 0 a: 1: a
> 6 b: 2: z
> 1 a: 3: a
> 3 b: 4: b4
> @@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit
> linkrev vs rev
>
> $ hg annotate -r tip -n a
> + 0: a
> 1: a
> - 0: a
> 1: a
>
> linkrev vs rev with -l
>
> $ hg annotate -r tip -nl a
> - 1:1: a
> 0:1: a
> + 1:2: a
> 1:3: a
>
> Issue589: "undelete" sequence leads to crash
> diff --git a/tests/test-bhalf.t b/tests/test-bhalf.t
> --- a/tests/test-bhalf.t
> +++ b/tests/test-bhalf.t
> @@ -105,8 +105,8 @@ Explore some bdiff implementation edge c
> --- a/x
> +++ b/x
> @@ -1,1 +1,3 @@
> + a
> +a
> - a
> +a
> diff --git a/y b/y
> --- a/y
> diff --git a/tests/test-commit-amend.t b/tests/test-commit-amend.t
> --- a/tests/test-commit-amend.t
> +++ b/tests/test-commit-amend.t
> @@ -47,8 +47,8 @@ Amending changeset with changes in worki
> --- a/a Thu Jan 01 00:00:00 1970 +0000
> +++ b/a Thu Jan 01 00:00:00 1970 +0000
> @@ -1,1 +1,3 @@
> + a
> +a
> - a
> +a
> $ hg log
> changeset: 1:43f1ba15f28a
> @@ -122,13 +122,13 @@ No changes, just a different message:
> uncompressed size of bundle content:
> 254 (changelog)
> 163 (manifests)
> - 141 a
> + 129 a
> saved backup bundle to $TESTTMP/.hg/strip-backup/74609c7f506e-1bfde511-amend-backup.hg (glob)
> 1 changesets found
> uncompressed size of bundle content:
> 250 (changelog)
> 163 (manifests)
> - 141 a
> + 129 a
> adding branch
> adding changesets
> adding manifests
> @@ -140,8 +140,8 @@ No changes, just a different message:
> --- a/a Thu Jan 01 00:00:00 1970 +0000
> +++ b/a Thu Jan 01 00:00:00 1970 +0000
> @@ -1,1 +1,3 @@
> + a
> +a
> - a
> +a
> $ hg log
> changeset: 1:1cd866679df8
> @@ -266,13 +266,13 @@ then, test editing custom commit message
> uncompressed size of bundle content:
> 249 (changelog)
> 163 (manifests)
> - 143 a
> + 131 a
> saved backup bundle to $TESTTMP/.hg/strip-backup/5f357c7560ab-e7c84ade-amend-backup.hg (glob)
> 1 changesets found
> uncompressed size of bundle content:
> 257 (changelog)
> 163 (manifests)
> - 143 a
> + 131 a
> adding branch
> adding changesets
> adding manifests
> @@ -309,13 +309,13 @@ Same, but with changes in working dir (d
> uncompressed size of bundle content:
> 464 (changelog)
> 322 (manifests)
> - 261 a
> + 249 a
> saved backup bundle to $TESTTMP/.hg/strip-backup/7ab3bf440b54-8e3b5088-amend-backup.hg (glob)
> 1 changesets found
> uncompressed size of bundle content:
> 257 (changelog)
> 163 (manifests)
> - 145 a
> + 133 a
> adding branch
> adding changesets
> adding manifests
More information about the Mercurial-devel
mailing list