[PATCH 00 of 15 RFC] verify: partial verification to detect repository corruption immediately

FUJIWARA Katsunori foozy at lares.dti.ne.jp
Tue Oct 9 01:13:55 CDT 2012


At Mon, 08 Oct 2012 14:52:52 -0500,
Matt Mackall wrote:
> 
> On Thu, 2012-10-04 at 01:38 +0900, FUJIWARA Katsunori wrote:
> > Sometimes, I assist Mercurial users with recovery from repository
> > corruption.
> > 
> > In many of such cases, filelogs are completely corrupted: header part
> > doesn't have valid value, and causes "unknown format".
> > 
> > In addition to it, many users can't notice such corruption
> > immediately: IMHO, dirstate cache reducing need of accessing to
> > filelog seems to hide this corruption from users.
> >
> > This "silent corruption" may cause loss of file contents.
> >
> > For example, if corruption happens just after adding new file on one
> > branch, updating to another branch removes newly added file from
> > working directory and causes loss of it.
> 
> I'm pretty skeptical about the value of this.
> 
> First, note that data corruption bugs in Mercurial itself have been
> extremely rare, so I would expect basically nothing in that category to
> be caught here. The other causes (user error, hardware error, OS error)
> typically occur while Mercurial isn't running or outside of Mercurial's
> view. Examples:
> 
> - user/tool recursively deletes/fails-to-copy a file in .hg/
> - hardware error causes data to land on the wrong sector.. inside our
> repo
> - write to disk is corrupted by a memory error when the write cache is
> flushed.. 10 seconds after Mercurial exits
> 
> 'Gross' errors of this sort (damaged headers or index) will be noticed
> noticed no sooner than the next time the file is visited (ie the current
> status quo).
> 
> Worse, I'm afraid that what IS caught here is quite likely to be
> misattributed by a large fraction of users as CAUSED by Mercurial
> because it is detected "during the commit".

Thank you for your comments, Matt.

In (almost) all cases in which I assisted users with recovery, they
work on Windows environment. So, I think that (1) network sharing
(SMB/CIFS) and/or (2) rude anti-virus software may cause corruption.

If the former causes corruption, caching in kernel/filesystem layer
should hide such corruption, as you explained above: I also think that
my approach can't detect in this case.

If the latter, I think that explicit re-reading revlogs in from
filesystem may detect file corruption by anti-virus just after closing
file descriptors.

Of course, anti-virus may damage files slowly, and my approach can't
work in such case.

What about (a) checking revlog entries not for current transaction but
for PREVIOUS one via "precommit"/"prechangegroup" hooks, and (b)
emphasizing that things other than Mercurial cause corruption in the
error message ? :-)

OK, I withdraw my proposal.

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy at lares.dti.ne.jp


More information about the Mercurial-devel mailing list