D2884: wireproto: experimental command to emit file data
indygreg (Gregory Szorc)
phabricator at mercurial-scm.org
Fri Mar 16 23:07:29 UTC 2018
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.
REVISION SUMMARY
Partial clones will require new wire protocol functionality to
retrieve repository data. The remotefilelog extensions - which
implements various aspects of partial clone - adds a handful of
wire protocol commands:
getflogheads
Obtain heads of a filelog
getfile
Obtain data for an individual file revision
getfiles
Batch version of getfile
getpackv1
Obtain a "pack file" containing index and data on multiple
files
(among others)
Recently, the wire protocol has gained support for "obtain repository
data" in the form of overloading the "getbundle" wire protocol
command. This is arguaby OK in the context of "all data is attached
to bundles" and "bundles are a self-contained representation of
complete repository data." But partial clone invalidates these
assumptions because in a partial clone world, we no longer can assume
things like "the client has all the base revisions."
In a partial clone world, we'll need wire protocol commands that allow
clients to obtain specific pieces of data with vastly different
access patterns. For example, a client may want to obtain "index"
data but keep the fulltext data on the server. Or vice-versa. Or a
client may wish to fetch all revisions of a specific file but only
the latest revision of another. These access patterns will be
difficult to shoehorn into single, powerful commands (like
"getbundle"). Even if we could, doing that isn't wise from a server
implementation perspective because it makes implementing scalable
servers hard. We want server-side commands to be small and simple
so alternate server implementations can come into existence more
easily.
This is one reason why the frame-based wire protocol I'm implementing
supports command pipelining and out-of-order responses. This
property will enable clients performing complex operations to send
command streams containing dozens or even hundreds of small command
requests to servers.
Anyway, this commit implements an experimental wire protocol command
for "get files data." Essentially, you give it a changeset revision
you are interested in and it spits back all the files and their data
in that revision, as fulltexts.
This command is just one way a server could emit data for files.
A variation of this command that accepts specific file paths and nodes
whose data is to be retrieved would also be useful. And I imagine we'll
eventually implement that. It would also be useful to emit index
data. Or have each file blob be individually compressed. (Right now
compression is performed on the whole stream because that's how the
wire protocol currently works - but I have plans to evolve the frame
based protocol to do new and novel things here.)
I'm not even sure this variation of the wire protocol command is a
good one to have! One reason I want to start with this command is
that it seems like a useful primitive. For example, with this
command, one could build a client that is able to realize a working
directory from a single wire protocol request: you can literally
stream the response to this command and turn the data into files on
the filesystem with minimal stream processing!
As implemented, this command is effectively a benchmark of revlog
reading and/or compression. On the mozilla-unified repository when
operating on revision c488b8d0e074efb490ebca32db68eb77871bfd2f (a
recent revision of mozilla-central, the head of Firefox development),
my i7-6700K yields the following:
- no compression: 1478MB; ~94s wall; ~56s CPU
- zstd level 3: 343MB; ~97s wall; ~57s CPU
- zlib level 6: 367MB; ~116s wall; ~74s CPU
For comparison, `hg bundle --base null -r c488b8d0e0 -t zstd-v2`
(which approximates what `hg clone -r` would be doing on the server)
yields:
1397MB; ~624s wall; ~225s CPU
Of course, these are vastly different operations. But this does
demonstrate that if your use case of version control is "check out
revision X" and you were previously relying on `hg clone` [without
stream clone bundles] to do that, this wire protocol command
is overall much more efficient on servers. It's worth noting that
the use case of version control for many automated systems *is*
"check out revision X." So I think providing a clone mode that can
realize a working copy as fast as possible is a worthwhile feature
to have!
REPOSITORY
rHG Mercurial
REVISION DETAIL
https://phab.mercurial-scm.org/D2884
AFFECTED FILES
mercurial/configitems.py
mercurial/help/internals/wireprotocol.txt
mercurial/wireproto.py
tests/test-wireproto-revsfiledata.t
CHANGE DETAILS
diff --git a/tests/test-wireproto-revsfiledata.t b/tests/test-wireproto-revsfiledata.t
new file mode 100644
--- /dev/null
+++ b/tests/test-wireproto-revsfiledata.t
@@ -0,0 +1,244 @@
+ $ CMDNAME=exp-revfilesdata-001
+
+ $ cat >> $HGRCPATH << EOF
+ > [server]
+ > compressionengines = none
+ > EOF
+
+ $ hg init server
+ $ cd server
+ $ echo 'foo revision 0' > foo
+ $ hg -q commit -A -m initial
+ $ echo 'foo revision 1' > foo
+ $ echo 'bar 0' > bar
+ $ hg -q commit -A -m second
+ $ chmod +x foo
+ $ hg commit -m third
+
+revfilesdata requires a config options
+
+ $ hg serve -p $HGPORT -d --pid-file hg.pid
+ $ cat hg.pid > $DAEMON_PIDS
+
+ $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+ > httprequest GET ?cmd=$CMDNAME
+ > user-agent: test
+ > x-hgarg-1: node=irrelevant
+ > x-hgproto-1: 0.2
+ > EOF
+ using raw connection to peer
+ s> GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+ s> Accept-Encoding: identity\r\n
+ s> user-agent: test\r\n
+ s> x-hgarg-1: node=irrelevant\r\n
+ s> x-hgproto-1: 0.2\r\n
+ s> host: $LOCALIP:$HGPORT\r\n (glob)
+ s> \r\n
+ s> makefile('rb', None)
+ s> HTTP/1.1 200 Script output follows\r\n
+ s> Server: testing stub value\r\n
+ s> Date: $HTTP_DATE$\r\n
+ s> Content-Type: application/hg-error\r\n
+ s> Content-Length: 49\r\n
+ s> \r\n
+ s> revfilesdata wire protocol command is not enabled
+
+ $ cat >> $HGRCPATH << EOF
+ > [experimental]
+ > server.revfilesdata = true
+ > EOF
+
+ $ killdaemons.py
+ $ hg serve -p $HGPORT -d --pid-file hg.pid
+ $ cat hg.pid > $DAEMON_PIDS
+
+Node must be full hash
+
+ $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+ > httprequest GET ?cmd=$CMDNAME
+ > user-agent: test
+ > x-hgarg-1: node=tip
+ > x-hgproto-1: 0.2
+ > EOF
+ using raw connection to peer
+ s> GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+ s> Accept-Encoding: identity\r\n
+ s> user-agent: test\r\n
+ s> x-hgarg-1: node=tip\r\n
+ s> x-hgproto-1: 0.2\r\n
+ s> host: $LOCALIP:$HGPORT\r\n (glob)
+ s> \r\n
+ s> makefile('rb', None)
+ s> HTTP/1.1 200 Script output follows\r\n
+ s> Server: testing stub value\r\n
+ s> Date: $HTTP_DATE$\r\n
+ s> Content-Type: application/hg-error\r\n
+ s> Content-Length: 31\r\n
+ s> \r\n
+ s> nodes argument must be 40 bytes
+
+And it must be a known hash
+
+ $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+ > httprequest GET ?cmd=$CMDNAME
+ > user-agent: test
+ > x-hgarg-1: node=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ > x-hgproto-1: 0.2
+ > EOF
+ using raw connection to peer
+ s> GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+ s> Accept-Encoding: identity\r\n
+ s> user-agent: test\r\n
+ s> x-hgarg-1: node=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\r\n
+ s> x-hgproto-1: 0.2\r\n
+ s> host: $LOCALIP:$HGPORT\r\n (glob)
+ s> \r\n
+ s> makefile('rb', None)
+ s> HTTP/1.1 200 Script output follows\r\n
+ s> Server: testing stub value\r\n
+ s> Date: $HTTP_DATE$\r\n
+ s> Content-Type: application/hg-error\r\n
+ s> Content-Length: 54\r\n
+ s> \r\n
+ s> unknown node: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+Request for revision with single file
+
+ $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+ > httprequest GET ?cmd=$CMDNAME
+ > user-agent: test
+ > x-hgarg-1: node=a64d23ad96a87844da3723df73c209a1c5507999
+ > x-hgproto-1: 0.2
+ > EOF
+ using raw connection to peer
+ s> GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+ s> Accept-Encoding: identity\r\n
+ s> user-agent: test\r\n
+ s> x-hgarg-1: node=a64d23ad96a87844da3723df73c209a1c5507999\r\n
+ s> x-hgproto-1: 0.2\r\n
+ s> host: $LOCALIP:$HGPORT\r\n (glob)
+ s> \r\n
+ s> makefile('rb', None)
+ s> HTTP/1.1 200 Script output follows\r\n
+ s> Server: testing stub value\r\n
+ s> Date: $HTTP_DATE$\r\n
+ s> Content-Type: application/mercurial-0.2\r\n
+ s> Transfer-Encoding: chunked\r\n
+ s> \r\n
+ s> 1\r\n
+ s> \x04
+ s> \r\n
+ s> 4\r\n
+ s> none
+ s> \r\n
+ s> 1f\r\n
+ s> F\x92\xc6\xd5/y\x90\xcce\x0c\xea\x80\xd0\xca\xe1\xde6\xb5wX\x03\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x00
+ s> \r\n
+ s> 3\r\n
+ s> foo
+ s> \r\n
+ s> f\r\n
+ s> foo revision 0\n
+ s> \r\n
+ s> 0\r\n
+ s> \r\n
+
+Revision with multiple files
+
+ $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+ > httprequest GET ?cmd=$CMDNAME
+ > user-agent: test
+ > x-hgarg-1: node=bc56cef01319bf181be2886f8a3aefea9a33bfdb
+ > x-hgproto-1: 0.2
+ > EOF
+ using raw connection to peer
+ s> GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+ s> Accept-Encoding: identity\r\n
+ s> user-agent: test\r\n
+ s> x-hgarg-1: node=bc56cef01319bf181be2886f8a3aefea9a33bfdb\r\n
+ s> x-hgproto-1: 0.2\r\n
+ s> host: $LOCALIP:$HGPORT\r\n (glob)
+ s> \r\n
+ s> makefile('rb', None)
+ s> HTTP/1.1 200 Script output follows\r\n
+ s> Server: testing stub value\r\n
+ s> Date: $HTTP_DATE$\r\n
+ s> Content-Type: application/mercurial-0.2\r\n
+ s> Transfer-Encoding: chunked\r\n
+ s> \r\n
+ s> 1\r\n
+ s> \x04
+ s> \r\n
+ s> 4\r\n
+ s> none
+ s> \r\n
+ s> 1f\r\n
+ s> \xdb&\xb9\xed\xe1\xcc\xd5]\xdact\xb01\x14h\xda\xe3\xc2\xe2\xd9\x03\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00
+ s> \r\n
+ s> 3\r\n
+ s> bar
+ s> \r\n
+ s> 6\r\n
+ s> bar 0\n
+ s> \r\n
+ s> 1f\r\n
+ s> $\x95\x1c\xb3\x8e(\xc6>\xf8\x0cx\\\x88G\xbd\xd3[\x08\x13c\x03\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x00
+ s> \r\n
+ s> 3\r\n
+ s> foo
+ s> \r\n
+ s> f\r\n
+ s> foo revision 1\n
+ s> \r\n
+ s> 0\r\n
+ s> \r\n
+
+And with the executable bit set
+
+ $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+ > httprequest GET ?cmd=$CMDNAME
+ > user-agent: test
+ > x-hgarg-1: node=328fdcd53a5d2f0dd58397e1f1ed73d5913332fe
+ > x-hgproto-1: 0.2
+ > EOF
+ using raw connection to peer
+ s> GET /?cmd=exp-revfilesdata-001 HTTP/1.1\r\n
+ s> Accept-Encoding: identity\r\n
+ s> user-agent: test\r\n
+ s> x-hgarg-1: node=328fdcd53a5d2f0dd58397e1f1ed73d5913332fe\r\n
+ s> x-hgproto-1: 0.2\r\n
+ s> host: $LOCALIP:$HGPORT\r\n (glob)
+ s> \r\n
+ s> makefile('rb', None)
+ s> HTTP/1.1 200 Script output follows\r\n
+ s> Server: testing stub value\r\n
+ s> Date: $HTTP_DATE$\r\n
+ s> Content-Type: application/mercurial-0.2\r\n
+ s> Transfer-Encoding: chunked\r\n
+ s> \r\n
+ s> 1\r\n
+ s> \x04
+ s> \r\n
+ s> 4\r\n
+ s> none
+ s> \r\n
+ s> 1f\r\n
+ s> \xdb&\xb9\xed\xe1\xcc\xd5]\xdact\xb01\x14h\xda\xe3\xc2\xe2\xd9\x03\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00
+ s> \r\n
+ s> 3\r\n
+ s> bar
+ s> \r\n
+ s> 6\r\n
+ s> bar 0\n
+ s> \r\n
+ s> 1f\r\n
+ s> $\x95\x1c\xb3\x8e(\xc6>\xf8\x0cx\\\x88G\xbd\xd3[\x08\x13c\x03\x00\x0f\x00\x00\x00\x00\x00\x00\x00\x01
+ s> \r\n
+ s> 3\r\n
+ s> foo
+ s> \r\n
+ s> f\r\n
+ s> foo revision 1\n
+ s> \r\n
+ s> 0\r\n
+ s> \r\n
diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
--- a/mercurial/wireproto.py
+++ b/mercurial/wireproto.py
@@ -9,6 +9,7 @@
import hashlib
import os
+import struct
import tempfile
from .i18n import _
@@ -1132,3 +1133,59 @@
bundler.newpart('error:pushraced',
[('message', util.forcebytestr(exc))])
return streamres_legacy(gen=bundler.getchunks())
+
+ at wireprotocommand('exp-revfilesdata-001', 'node',
+ permission='pull')
+def revfilesdata(repo, proto, node):
+ """Obtain file data for a particular revision.
+
+ Given a node, emit metadata about files in that revision and their data.
+
+ TODO support receiving a narrow spec, integrating with a matcher.
+ TODO only expose to transport version 2
+ """
+ if not repo.ui.configbool('experimental', 'server.revfilesdata'):
+ return wireprototypes.ooberror(_('revfilesdata wire protocol command '
+ 'is not enabled'))
+
+ if len(node) != 40:
+ return wireprototypes.ooberror(_('nodes argument must be 40 bytes'))
+
+ try:
+ ctx = repo[bin(node)]
+ except error.RepoLookupError:
+ return wireprototypes.ooberror(_('unknown node: %s') % node)
+
+ pathflags = {}
+
+ def makeentries():
+ for (path, node, flags) in ctx.manifest().iterentries():
+ pathflags[path] = flags
+ yield path, node
+
+ results = repo.filesstore.resolvefilesdata(makeentries())
+
+ # Output consists of structs followed by raw data.
+ s = struct.Struct(r'<20sHQB')
+
+ def emitdata():
+ for result, path, node, data in results:
+ flags = pathflags[path]
+ del pathflags[path]
+
+ if result == 'ok':
+ rawflag = 0
+ if b'x' in flags:
+ rawflag |= 1
+ if b'l' in flags:
+ rawflag |= 2
+
+ yield s.pack(node, len(path), len(data), rawflag)
+ yield path
+ yield data
+
+ else:
+ raise error.ProgrammingError('do not yet handle %s results' %
+ result)
+
+ return wireprototypes.streamres(emitdata())
diff --git a/mercurial/help/internals/wireprotocol.txt b/mercurial/help/internals/wireprotocol.txt
--- a/mercurial/help/internals/wireprotocol.txt
+++ b/mercurial/help/internals/wireprotocol.txt
@@ -1247,6 +1247,37 @@
The return type is a ``string``.
+exp-revsfilesdata-001
+---------------------
+
+**(Experimental and subject to behavior changes)**
+
+This command allows obtaining the fulltext of files data for a specific
+revision.
+
+The ``node`` argument defines the revision whose file data is to
+be retrieved.
+
+The response is a stream consisting of a series of files data records.
+Each record begins with a 31 byte struct. The struct contains:
+
+* 20 bytes file node.
+* 16-bit unsigned little-endian integer defining the size of the file
+ name.
+* 64-bit unsigned little-endian integer defining the size of the file
+ data.
+* 1 byte containing file flags.
+
+The file flags byte has the ``0x01`` bit set if the file is executable.
+The ``0x02`` bit is set if the file is a symlink. If a symlink, the raw
+file data refers to the target of the symlink.
+
+Following that struct is the raw filename of the file. This is a raw
+byte string and has no encoding (Mercurial stores filenames as binary
+byte sequences). Following the filename is the raw file data.
+Following the raw file data is the next file record struct, or end of
+stream.
+
getbundle
---------
diff --git a/mercurial/configitems.py b/mercurial/configitems.py
--- a/mercurial/configitems.py
+++ b/mercurial/configitems.py
@@ -574,6 +574,9 @@
coreconfigitem('experimental', 'update.atomic-file',
default=False,
)
+coreconfigitem('experimental', 'server.revfilesdata',
+ default=False,
+)
coreconfigitem('experimental', 'sshpeer.advertise-v2',
default=False,
)
To: indygreg, #hg-reviewers
Cc: mercurial-devel
More information about the Mercurial-devel
mailing list