[PATCH 00 of 15 RFC] verify: partial verification to detect repository corruption immediately

Matt Mackall mpm at selenic.com
Fri Oct 12 14:41:31 CDT 2012


On Tue, 2012-10-09 at 15:13 +0900, FUJIWARA Katsunori wrote:
> At Mon, 08 Oct 2012 14:52:52 -0500,
> Matt Mackall wrote:
> > 
> > On Thu, 2012-10-04 at 01:38 +0900, FUJIWARA Katsunori wrote:
> > > Sometimes, I assist Mercurial users with recovery from repository
> > > corruption.
> > > 
> > > In many of such cases, filelogs are completely corrupted: header part
> > > doesn't have valid value, and causes "unknown format".
> > > 
> > > In addition to it, many users can't notice such corruption
> > > immediately: IMHO, dirstate cache reducing need of accessing to
> > > filelog seems to hide this corruption from users.
> > >
> > > This "silent corruption" may cause loss of file contents.
> > >
> > > For example, if corruption happens just after adding new file on one
> > > branch, updating to another branch removes newly added file from
> > > working directory and causes loss of it.
> > 
> > I'm pretty skeptical about the value of this.
> > 
> > First, note that data corruption bugs in Mercurial itself have been
> > extremely rare, so I would expect basically nothing in that category to
> > be caught here. The other causes (user error, hardware error, OS error)
> > typically occur while Mercurial isn't running or outside of Mercurial's
> > view. Examples:
> > 
> > - user/tool recursively deletes/fails-to-copy a file in .hg/
> > - hardware error causes data to land on the wrong sector.. inside our
> > repo
> > - write to disk is corrupted by a memory error when the write cache is
> > flushed.. 10 seconds after Mercurial exits
> > 
> > 'Gross' errors of this sort (damaged headers or index) will be noticed
> > noticed no sooner than the next time the file is visited (ie the current
> > status quo).
> > 
> > Worse, I'm afraid that what IS caught here is quite likely to be
> > misattributed by a large fraction of users as CAUSED by Mercurial
> > because it is detected "during the commit".
> 
> Thank you for your comments, Matt.
> 
> In (almost) all cases in which I assisted users with recovery, they
> work on Windows environment. So, I think that (1) network sharing
> (SMB/CIFS) and/or (2) rude anti-virus software may cause corruption.
> 
> If the former causes corruption, caching in kernel/filesystem layer
> should hide such corruption, as you explained above: I also think that
> my approach can't detect in this case.
> 
> If the latter, I think that explicit re-reading revlogs in from
> filesystem may detect file corruption by anti-virus just after closing
> file descriptors.

Perhaps. Though I think the bulk of our antivirus issue manifest as
aborts already, rather than silent corruption.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list