[PATCH V2] checkcopies: don't lose origin of file during merge (issue4748)

Matt Mackall mpm at selenic.com
Thu Jul 16 19:20:08 CDT 2015

On Thu, 2015-07-16 at 10:33 +0200, Jeremy Parente wrote:
> # HG changeset patch
> # User Jeremy Parente <jeremy.parente at oneaccess-net.com>
> # Date 1437035066 -7200
> #      Thu Jul 16 10:24:26 2015 +0200
> # Branch stable
> # Node ID abd4cab8a1bac17d149ec44c36e9f556670c14b1
> # Parent  540cd0ddac49c1125b2e013aa2ff18ecbd4dd954
> checkcopies: don't lose origin of file during merge (issue4748)

Ok, I've spent most of today thinking about this and I've decided I'm
going to have to reject it. It's a lovely patch and you did in fact find
the right place to make the change, and the test changes look good too,
but I'm afraid it bumps up against deeper theoretical concerns.

Let's imagine we've got a file named a that gets renamed to b, and then
later a merge+commit happens. The DAG of that file's history today looks
like this:


With your patch, it looks like:


(FYI, you can see this with debugindex and debugrename)

..which gives us a superfluous second revision of b that's unchanged,
except that it's been renamed from a.. which we already knew and
recorded. It also says a is both a parent and grandparent of b, which is
false (and generally bad form, even for computers). 

And the resulting extra DAG entry is actually quite undesirable, because
it now looks like "a change" and will fool later merges into thinking
something interesting happened and cause bad merge decisions to happen.
Also, it's going to generate tons of redundant file nodes on branchy

The first, simpler graph more accurately reflects the history. In one
branch we did a rename.. and in the other nothing happened, so nothing
was recorded. This is distinct from the other case you mentioned, which
looks like this:

 \    / <- this edge is not technically a rename or copy[1]

..where the new node is not redundant and a is not both parent and
grandparent of b'. So it's perfectly kosher.

Now you may be thinking "but the diff.." Yes, the diff is unhelpful, but
that's just another instance of the classic Diffs Of Merges Are
Basically Meaningless Because It's The Wrong Tool For Job problem:


However, in this particular instance, we could make diff slightly
smarter.. by giving it less information. If you try this in Git, it'll
work, but only because Git never actually stores any sort of rename
metadata in history. So it literally guesses where renames are every
time by comparing file contents without any reference to their actual
history. In the future, we could supplement our diff (and merge)
algorithms with this sort of heuristic.. when real rename data isn't

[1] we call it a 'ypoc', because it looks like a time-reversed copy. And
diff has no concept of ypocs because it's unfit for merges.

Mathematics is the supreme nostalgia of our time.

More information about the Mercurial-devel mailing list