[PATCH 4 of 6] verify: invoke the file prefetch hook

Yuya Nishihara yuya at tcha.org
Mon Apr 16 10:11:33 EDT 2018


On Mon, 16 Apr 2018 09:59:49 -0400, Matt Harbison wrote:
> 
> > On Apr 16, 2018, at 9:39 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> > 
> >> On Mon, 16 Apr 2018 09:15:48 -0400, Matt Harbison wrote:
> >> 
> >>>> On Apr 16, 2018, at 8:34 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> >>>> 
> >>>> On Mon, 16 Apr 2018 08:25:07 -0400, Matt Harbison wrote:
> >>>> 
> >>>>>> On Apr 16, 2018, at 7:35 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> >>>>>> 
> >>>>>> On Sun, 15 Apr 2018 02:44:08 -0400, Matt Harbison wrote:
> >>>>>> # HG changeset patch
> >>>>>> # User Matt Harbison <matt_harbison at yahoo.com>
> >>>>>> # Date 1523752111 14400
> >>>>>> #      Sat Apr 14 20:28:31 2018 -0400
> >>>>>> # Node ID 3b0c3d4939b56ca038dbbba17da424699a6b339d
> >>>>>> # Parent  691a7d3f2df80908e52a170897a4492a528c3533
> >>>>>> verify: invoke the file prefetch hook
> >>>>>> 
> >>>>>> It's unfortunate that verify wants to download everything.  Maybe we can create
> >>>>>> a custom transfer handler for LFS.  But it shouldn't be painful in the meantime,
> >>>>>> and it's likely that blobs will be served up from more than just hgweb.
> >>>>>> 
> >>>>>> diff --git a/mercurial/verify.py b/mercurial/verify.py
> >>>>>> --- a/mercurial/verify.py
> >>>>>> +++ b/mercurial/verify.py
> >>>>>> @@ -25,6 +25,7 @@ from . import (
> >>>>>> 
> >>>>>> def verify(repo):
> >>>>>>   with repo.lock():
> >>>>>> +        scmutil.fileprefetchhooks(repo, repo.set('all()'))
> >>>>>>       return verifier(repo).verify()
> >>>>> 
> >>>>> I don't think "hg verify" should go that abstraction level because the
> >>>>> repository might be inconsistent state.
> >>>> 
> >>>> Do you mean using a revset, or prefetching at all?  Grabbing one file at a time will likely be painfully slow.
> >>>> 
> >>>> If the revset bit is the bad part, what is the alternative with a corrupt repo?
> >>> 
> >>> I think we should avoid any write operation on damaged repo. Is there any way
> >>> to disable fetching at all?
> >> 
> >> Maybe we could wrap hg.verify() to stuff a flag is svfs?  But I think then the revlog flag processor will flag each revision as corrupt anyway, hiding the real error.
> >> 
> >> Avoiding writes on a corrupt repo makes sense, but in this case I think it is safe.  The storage is external, so nothing gets more corrupt, and the blobs are checksummed on their own before adding them to the store.
> > 
> > That's true for lfs, but not all prefetch-able storage would behave as
> > such. Another concern is we're doing prefetch *before* verifying revlogs.
> > If a revlog is damaged, the prefetch would fail and no useful error would
> > be reported.
> > 
> > Perhaps we'll need "hg debugprefetch && hg verify" ?
> 
> But that would also prefetch before verifying the revlog?

Yes. The point is that "verify" never crashes because of prefetch. If we've
already downloaded all lfs data, revlog corruption can be verified by running
"hg verify."

"hg verify" has to be robust for data corruption.


More information about the Mercurial-devel mailing list