[PATCH 00 of 15 RFC] verify: partial verification to detect repository corruption immediately

Matt Mackall mpm at selenic.com
Mon Oct 8 14:52:52 CDT 2012


On Thu, 2012-10-04 at 01:38 +0900, FUJIWARA Katsunori wrote:
> Sometimes, I assist Mercurial users with recovery from repository
> corruption.
> 
> In many of such cases, filelogs are completely corrupted: header part
> doesn't have valid value, and causes "unknown format".
> 
> In addition to it, many users can't notice such corruption
> immediately: IMHO, dirstate cache reducing need of accessing to
> filelog seems to hide this corruption from users.
>
> This "silent corruption" may cause loss of file contents.
>
> For example, if corruption happens just after adding new file on one
> branch, updating to another branch removes newly added file from
> working directory and causes loss of it.

I'm pretty skeptical about the value of this.

First, note that data corruption bugs in Mercurial itself have been
extremely rare, so I would expect basically nothing in that category to
be caught here. The other causes (user error, hardware error, OS error)
typically occur while Mercurial isn't running or outside of Mercurial's
view. Examples:

- user/tool recursively deletes/fails-to-copy a file in .hg/
- hardware error causes data to land on the wrong sector.. inside our
repo
- write to disk is corrupted by a memory error when the write cache is
flushed.. 10 seconds after Mercurial exits

'Gross' errors of this sort (damaged headers or index) will be noticed
noticed no sooner than the next time the file is visited (ie the current
status quo).

Worse, I'm afraid that what IS caught here is quite likely to be
misattributed by a large fraction of users as CAUSED by Mercurial
because it is detected "during the commit".

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list