[PATCH 2 of 2 remotefilelog-ext getfile-batching] fileserverclient: add config knob to control batch size

Augie Fackler raf at durin42.com
Tue Aug 18 15:07:26 CDT 2015


# HG changeset patch
# User Augie Fackler <augie at google.com>
# Date 1439925241 14400
#      Tue Aug 18 15:14:01 2015 -0400
# Node ID 802538ce2a9be544828417ff0a86aba847a63307
# Parent  1183fcaf20272ca5dbcc0c308e9e11b1c7c97093
fileserverclient: add config knob to control batch size

Previously we'd just send one enormous batch for everything to the
server. This led to prolonged periods of no progress output for the
user. Now we send batches in smaller chunks (default is 100) which
gives the user some idea that things are working.

Includes a trivial test, which doesn't really verify that the batching
logic is used as described, but at least prevents the boneheaded error
I had in an earlier (unmailed) version of this patch which forgot to
use configint() when loading the config setting.

diff --git a/remotefilelog/fileserverclient.py b/remotefilelog/fileserverclient.py
--- a/remotefilelog/fileserverclient.py
+++ b/remotefilelog/fileserverclient.py
@@ -100,18 +100,21 @@ class cacheconnection(object):
 
         return result
 
-def _getfilesbatch(remote, receivemissing, progresstick, missed, idmap):
-    b = remote.batch()
-    futures = {}
-    for m in missed:
-        file_ = idmap[m]
-        node = m[-40:]
-        futures[m] = b.getfile(file_, node)
-    b.submit()
-    for m in missed:
-        v = futures[m].value
-        receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
-        progresstick()
+def _getfilesbatch(
+        remote, receivemissing, progresstick, missed, idmap, batchsize):
+    while missed:
+        chunk, missed = missed[:batchsize], missed[batchsize:]
+        b = remote.batch()
+        futures = {}
+        for m in chunk:
+            file_ = idmap[m]
+            node = m[-40:]
+            futures[m] = b.getfile(file_, node)
+        b.submit()
+        for m in chunk:
+            v = futures[m].value
+            receivemissing(io.BytesIO('%d\n%s' % (len(v), v)), m)
+            progresstick()
 
 def _getfiles(
     remote, receivemissing, fallbackpath, progresstick, missed, idmap):
@@ -234,8 +237,12 @@ class fileserverclient(object):
                         _getfiles(remote, self.receivemissing, fallbackpath,
                                   progresstick, missed, idmap)
                     elif remote.capable("getfile"):
+                        batchdefault = 100 if remote.capable('batch') else 10
+                        batchsize = self.ui.configint(
+                            'remotefilelog', 'batchsize', batchdefault)
                         _getfilesbatch(
-                            remote, self.receivemissing, progresstick, missed, idmap)
+                            remote, self.receivemissing, progresstick, missed,
+                            idmap, batchsize)
                     else:
                         raise util.Abort("configured remotefilelog server"
                                          " does not support remotefilelog")
diff --git a/tests/test-http.t b/tests/test-http.t
--- a/tests/test-http.t
+++ b/tests/test-http.t
@@ -20,6 +20,12 @@ Build a query string for later use:
   $ hgcloneshallow http://localhost:$HGPORT/ shallow -q
   1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
 
+Clear filenode cache so we can test fetching with a modified batch size
+  $ rm -r $TESTTMP/hgcache
+Now do a fetch with a large batch size so we're sure it works
+  $ hgcloneshallow http://localhost:$HGPORT/ shallow-large-batch \
+  >    --config remotefilelog.batchsize=1000 -q
+  1 files fetched over 1 fetches - (1 misses, 0.00% hit ratio) over *s (glob)
 
 The 'remotefilelog' capability should *not* be exported over http(s),
 as the getfile method it offers doesn't work with http.


More information about the Mercurial-devel mailing list