[Bug 4152] New: Pulling a corrupted bundle isn't detected by the client at pull time

Thu Jan 23 18:27:24 CST 2014

http://bz.selenic.com/show_bug.cgi?id=4152

          Priority: normal
            Bug ID: 4152
                CC: mercurial-devel at selenic.com
          Assignee: bugzilla at selenic.com
           Summary: Pulling a corrupted bundle isn't detected by the
                    client at pull time
          Severity: bug
    Classification: Unclassified
                OS: All
          Reporter: gregory.szorc at gmail.com
          Hardware: PC
            Status: UNCONFIRMED
           Version: 2.8.2
         Component: Mercurial
           Product: Mercurial

https://hg.mozilla.org/ has its Mercurial repos hosted over NFS. Yes, we know
this is bad and we're actively working to resolve it. The issue that I'll
report in this bug is almost certainly has its roots in out-of-order I/O
processing by NFS or some either weird NFS foo. But I contend there may be a
client bug lingering here. Let me explain.

We've observed that clients (even modern 2.8.2) that pull at just the right
time may receive an "incomplete" bundle from the server. This manifests as an
IntegrityError on a subsequent pull. Usually, applying the filelog data from
the subsequent pull results in an "unknown parent."

I suspect what's happening is that the original/corrupted pull thinks it has
obtained all filelog data when in reality it hasn't. The next time the client
goes to pull, it compares heads, figures out which filelog revs are missing
based on the missing manifests, and bundles them. When the filelog revs come
down, Mercurial tries to apply them, realizes a parent (assumed to be present)
is missing, and aborts.

While I concede the server shouldn't be sending incomplete bundles due to an
unsupported hosting configuration, I also have to ask: why isn't the client
validating manifest/filelog entries at unbundle time? It seems to me the client
could be a little bit more robust here: it seems to be blindly accepting
corrupted data from the server. Shouldn't the client be performing some
validation at unbundle time to catch this? What if a bug accidentally creeps
into Mercurial causing servers to send corrupted bundles to unknowing clients?
Don't we want clients to detect this so pulling from a "bad" server doesn't
cascade the problem?

Why stop there. Why is the server / bundling code path issuing a corrupted
bundle? Shouldn't missing data be detected at bundle time?

WONTFIX if you want (because of the unsupported server configuration). But I'd
really like an explanation on why corruption isn't detected earlier.

-- 
You are receiving this mail because:
You are on the CC list for the bug.