[PATCH] largefiles: makes verify to work on local content by default (issue4242) (BC)

Mads Kiilerich mads at kiilerich.com
Tue Apr 12 17:07:49 EDT 2016


On 03/02/2016 02:05 PM, liscju wrote:
> # HG changeset patch
> # User liscju <piotr.listkiewicz at gmail.com>
> # Date 1456917749 -3600
> #      Wed Mar 02 12:22:29 2016 +0100
> # Node ID 184b0386fad4aff1ec64f0076c74c13e2cf5d036
> # Parent  c7f89ad87baef87f00c507545dfd4cc824bc3131
> largefiles: makes verify to work on local content by default (issue4242) (BC)

Sorry for late response. Here comes something.


Conceptually, there can be many different variations of verification of 
largefile repos:

Largefiles to check:
* referenced in specific revisions (currently only available as the 
default of checking current revision)
* referenced in repo (--lfa)
* all largefiles in store (cd .hg/largefiles && sha1sum -c <(ls 
?????????*| while read f; do echo $f $f; done))

Locations to check:
* check repo store
* check "user cache"
* check default remote server (currently apparently inevitable)
* first of these (should probably be default)

Kind of check:
* check existence (default)
* check actual hash (--lfc but not possible remote without downloading)

I don't know which combinations we want to support. The current options 
might be a bit arbitrary. If we change/add new options, we should make 
sure they are less arbitrary or represent common use cases.

I don't think we should support remote (wireproto) hash checking. That 
would make it too easy to make a cpu load DoS. It is more managable if 
"attackers" have to use bandwidth to hash remote largefiles. It is ok 
that everybody is responsible for their own repositories. Remote 
repositories should be checked by running verify locally on the remote 
server.


For the patch, I would prefer to have a first "add test coverage" / 
"demonstrate problem" patch - that would make it easier to spot what the 
problem is and discuss the problem and fixes.


It seems to me like there currently are two bugs (in design or 
implementation):

1. 'hg verify --large' currently invokes remote statlfile for all files, 
even if they already are available locally.

2. 'hg verify --large' currently sends one remote statlfile command once 
for each file. It should use batching.

If these two were fixed, I think it would be less of a problem (or 
perhaps no problem) that it falls back to using remote. It would exactly 
mirror what happens on actual use.

It seems like it currently handles lack of remote server availability by 
simple reporting the largefiles it needs missing. I think that is fine.


/Mads


More information about the Mercurial-devel mailing list