Bug 4462 - generaldelta over bundle1 can produce invalid filelog linkrev order
Summary: generaldelta over bundle1 can produce invalid filelog linkrev order
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: 3.2.1
Hardware: PC Mac OS
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-20 18:01 UTC by Durham Goode
Modified: 2015-01-22 15:04 UTC (History)
5 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Durham Goode 2014-11-20 18:01 UTC
When pulling from a generaldelta server using the bundle1 wire protocol, the linearizer can produce bundle1 changegroups such that a filelog linkrev points to a commit that is not the first to introduce that file version. This means if a strip is performed that deletes the later commit, but not the earlier one, the history becomes corrupt.

i.e.

With commits:

0<-1<---3<-4
  \       /
   --2<---

where 2 and 3 add the same version of a file, a filelog can be produced with linkrevs: 0<-3  instead of 0<-2.  So stripping commit 3 means commit 2 doesn't have access to the data anymore.


I have a repro in one of our big repositories, but I don't have a simple repro to share.
Comment 1 Pierre-Yves David 2014-11-20 18:29 UTC
adding sune and benoit
Comment 2 Gregory Szorc 2014-11-20 18:39 UTC
How is the corruption detected? Will Mercurial abort or is this the silent kind?

I ask because Mozilla just converted Try to generaldelta and I'm trying to assess our exposure to this. I *think* that as long as we don't strip it is ok?
Comment 3 Durham Goode 2014-11-20 18:42 UTC
As long as you don't do a strip, and as long as no one else who pulls from you does a strip, you'll be ok.  If they do do a strip, they'll be unable to checkout commits that depend on that file version, and future pulls will fail if they download any file revs that depend on the rev that got deleted.
Comment 4 Benoit Boissinot 2014-11-20 18:43 UTC
Can you clarify which parts happen on server vs. client in your scenario?
Comment 5 Durham Goode 2014-11-20 18:46 UTC
The server and client are both general delta.  But I don't have bundle 2 enabled.  When I do a pull to the client, the server produces a bad bundle1 changegroup.  The client then applies it, which is fine as long as no one does a strip.
Comment 6 Gregory Szorc 2014-11-20 18:47 UTC
So if server is gd and client is classic, then all is OK?
Comment 7 Durham Goode 2014-11-20 18:50 UTC
No, the client is still screwed up if it isn't general delta.  The actual bundle sent across the wire is bad, so the client settings don't matter.  I was just saying it happened to be general delta in my test.
Comment 8 Durham Goode 2014-11-20 19:20 UTC
Ok, I'm testing a fix.  There were two problems:

1) cg1packer.generate needs to always take the slow path when reorder is true (so it can calculate the appropriate linkrevs as it goes).

2) lookupmf currently thinks that the first manifest to introduce a file version is also the earliest commit to introduce that file version, which is not true.  If the changelog is reordered one way, and the manifest is reordered another way, the first manifest to introduce a file version may not have a linkrev to the first commit that introduced that file rev (a different manifest may have done it first).  I'm just going to add some logic to lookupmf to make sure it uses the `lowest` linkrev seen for each file.
Comment 9 HG Bot 2014-11-22 18:17 UTC
Fixed by http://selenic.com/repo/hg/rev/cc0ff93d0c0c
Durham Goode <durham@fb.com>
changegroup: fix file linkrevs during reorders (issue4462)

Previously, if reorder was true during the creation of a changegroup bundle,
it was possible that the manifest and filelogs would be reordered such that the
resulting bundle filelog had a linkrev that pointed to a commit that was not
the earliest instance of the filelog revision. For example:

With commits:

0<-1<---3<-4
  \       /
   --2<---

if 2 and 3 added the same version of a file, if the manifests of 2 and 3 have
their order reversed, but the changelog did not, it could produce a filelog with
linkrevs 0<-3 instead of 0<-2, which meant if commit 3 was stripped, it would
delete that file data from the repository and commit 2 would be corrupt (as
would any future pulls that tried to build upon that version of the file).

The fix is to make the linkrev fixup smarter. Previously it considered the first
manifest that added a file to be the first commit that added that file, which is
not true. Now, for every file revision we add to the bundle we make sure we
attach it to the earliest applicable linkrev.

(please test the fix)
Comment 10 Matt Mackall 2015-01-22 15:04 UTC
Bulk testing -> fixed