Wire protocol futures

Fri Aug 31 22:47:34 UTC 2018

This is a long post and I apologize in advance for that. I've been spending
a lot of company-sponsored time on the wire protocol and storage this year
in order to get partial clones in a place where Mozilla can start using
them heavily. I realized that I haven't done a good job articulating the
overall vision for that work and I wanted to write up a semi-comprehensive
brain dump of things that are on my mind and changes that I plan to send
out for review in the next few weeks. Despite the length of this post, it
isn't a comprehensive brain dump: I'm excluding details about storage
refactorings for example. This post focuses mostly on the wire protocol.

Back in the 4.6 cycle, I started work on a ground-up rewrite of the wire
protocol. The overarching goal behind this work was implementing partial
clone in an as-optimal-as-possible manner. (Partial clone is the ability
for clients to have a subset of files and/or a subset of history, where
history applies to file, manifest, and even changesets).

I looked at existing implementations of partial clone (namely remotefilelog
and narrow extensions) and saw what I perceived to be sub-optimal decisions
on account of limitations in the existing wire protocol. I also saw
limitations in the existing wire protocol that made building out scalable
and future-proof servers difficult. Here is a partial list of problems with
the existing wire protocol and command set (in no particular order):

* The HTTP and SSH transports vary drastically and require substantially
different client and server code paths to handle.
* The SSH transport has no outer "framing" protocol and this makes it
nearly impossible to have nice things (like compression of arbitrary
payloads).
* Lack of binary safety in command arguments (requires transferring e.g.
hex nodes instead of binary, which adds overhead).
* Usage of bespoke encoding formats for data. We needed to roll our own
encoders and decoders for every data structure. This adds complexity and
limits future changes since you can't easily add fields without breaking
backwards compatibility. (See bundle2.)
* Server-side command processing limited to a single thread (tolerable for
Python due to GIL, not so much for custom servers that can implement
multithreading more easily).
* Lack of side-channels for representing progress events on server.
* Server-emitted messages were translated before being sent to the client
(bad UX experience for non-English speakers).
* Clone bundles and pull bundles are added features that clients need to
know about rather than being supported by the command protocol itself (more
on this later).
* Not possible to batch issue any set of commands (some commands could be
batched, others couldn't).
* Not obvious how to transition away from SHA-1.
* Not obvious how to transition an existing repo from flat manifests to
tree manifests.
* Auth / access control outside of Mercurial difficult to implement.
* Clients could not issue thousands of commands efficiently.
* Difficult to implement server-side caching.
* linkrev/linknode adjustment is performed on the server and not very
extensible.
* Data access model geared heavily towards full clones. i.e. no low-level
APIs for accessing specific data, like just a single revision of a single
file or just the index shape of the changelog.
* General lack of typing and strong specification of semantics, data
formats, etc.

I postulate that many of the design decisions around the current set of
wire protocol commands stem from limitations in the existing wire protocol
transport format. And that transport format and general repository
cloning/pull strategy has changed little since ~2006.

For example, because we don't have a unified transport format, things like
compression are inconsistent between SSH and HTTP and need to be dealt with
in arcane ways. Performance and code maintainability suffers.

For example, because we can't issue thousands of commands efficiently, we
build monolithic commands (like "getbundle") that transfer many pieces of
data. And a side-effect of monolithic commands is increased server-side
complexity. And that makes implementing alternate servers more difficult.
And it undermines caching potential. And it makes it more difficult to
implement things like resumable clone.

I wanted to build wire protocol transport and command layers that would
give us the flexibility to start from first principles and implement data
exchange on our own terms, using the knowledge that we've accrued in the
10+ years of the project and the cumulative decades of version control
experience that various contributors have accrued. This means designing a
wire protocol transport and command layer that facilitates server scaling,
fast data access, and future changes (both from extensions and core
changes). In my mind, this translates to the following set of requirements:

* Keeping commands simple. This will make the server simple and make it
easier to implement alternate servers.
* Making commands deterministic and idempotent (to facilitate aggressive
caching).
* Supporting parallel serving and consumption with minimal overhead (to
enable clients/servers not restricted by the GIL to go as fast as possible).
* Extensible compression formats and ability to have fine-grained control
over compression.
* Providing granular access to data to facilitate multiple clone / checkout
modes (e.g. a `svn co` style model for CI where the "clone" contains files
for a single revision and not much more).
* Support for out-of-band response serving built into the protocol itself
(basically clonebundles but for any command).
* And more.

Later in this post, I'll go into details of what I've built so far and what
is yet to come. But first, some history.

In Mercurial today, the "getbundle" wire protocol command is used to
transfer most repository data from server to client.

In Mercurial 1.0, "getbundle" transferred a changegroup. A changegroup is a
data structure containing segments of data corresponding to the changelog,
manifestlog, and filelogs. This data contains "index" data (describes the
DAG shape and linknodes) and "revision" data (describes fulltexts - usually
as deltas). Essentially, a changegroup encapsulates revlog data. The
initial version of bundle files were essentially changegroups.

In Mercurial 1.6, the "listkeys" command was added. This command was used
to transfer data not in revlogs/changegroups, such as bookmarks and phases.

There were problems with this approach. Notably:

* Server state could mutate between command requests, causing clients to
have inconsistent or invalid data for bookmarks, phases, or anything else
not transferred by changegroup data.
* `hg bundle` didn't record all data necessary to express repository state.

Mercurial 3.4 introduced bundle2 to solve these problems and more. Bundle2
is a generic container format and therefore allows extensible storage via
part names. When new data types are introduced, we invent a new bundle2
part for them. The payload of each bundle2 part is defined by that part.
i.e. we need to invent encoders and decoders for each part.

At the wire protocol level, bundle2 shoehorned itself into the "getbundle"
wire protocol command. If the client passed certain arguments into the
command, the server would emit a bundle2 bundle instead of changegroup data.

Over time, bundle2 kept growing. The wire protocol exchange and
capabilities negotiation kept getting more complicated. (And that is
arguably OK: that's the nature of an ever-changing system with backwards
and future compatibility constraints.)

At this time, all meaningful repository data can be transferred from server
to client via "getbundle" with a bundle2 payload. From an end-user
perspective, things are great because all data is retrieved atomically and
standalone bundle files can hold all repository data.

But on a technical level, things are not so great.

In terms of data retrieval, there is effectively a single, monolithic
server-side command: "getbundle." It's a "god RPC." And on the push side,
the "unbundle" command is in a very similar boat as "getbundle." And
limitations in the existing wire protocol transports makes it more
difficult than it should be to introduce new commands.

Various parties want to implement partial clone in Mercurial. The
remotefilelog (RFL) and narrow extensions have both done this to some
degree. But they did so building on top of the existing wire protocol
transports. And in the case of narrow, it is built on top of the existing
command set - namely "getbundle" and bundle2 (RFL introduces new wire
protocol commands for transferring just file data).

Let's talk about these in more detail.

On the server, narrow burrows itself into the bowels of "getbundle" and
bundle/changegroup generation. It introduces command arguments to allow
clients to specify what files they are interested in, which nodes have been
retrieved, etc. The server then takes all of this into account and adjusts
the set of returned data accordingly. And there is a lot of code and
complexity involved. And a lot of it is on the server. This makes servers
more difficult to implement and harder to scale.

Remotefilelog takes a different approach. RFL introduces new wire protocol
commands for retrieving just file data. There is a command for retrieving
the fulltext of just a single file. There is a command for bulk retrieval
of file data (you essentially give it an iterable of paths and nodes and it
spits out a changegroup-like data structure containing "index" and
"revision" data for all of them). And RFL changes how clone/pull works.
Instead of a single call to "getbundle" to retrieve all of the data, it
requests just the changeset and manifest data first then follows up with
calls to the RFL commands for file data retrieval.

When I think about ways to implement partial clone, one theme that keeps
worrying me is scalability. We already have problems scaling Mercurial
servers. Clone and pull bundles are terrific solutions (as I wrote at [1],
clone bundles are offloading ~97% of bytes served from hg.mozilla.org).
But, these solutions work best with full clones, when the set of retrieved
data is known ahead of time and can be pre-generated. Partial clones
invalidate this world: it is no longer possible ahead of time to know
exactly what data will be requested. And even if you did, for high velocity
(commit rate) repos, the set of data being retrieved will be highly
dynamic, making pre-generated bundle files prohibitively difficult to
implement.

This means that partial clone necessitates more traditional caching. (e.g.
transparent caching of any wire protocol command response backed by an LRU
store). But because "getbundle" is a monolithic, complicated, and an
ever-evolving command, I have my doubts that caching of this command is
feasible. Yes, it is certainly doable, but at high
implementation/maintenance expense and high chance of introducing caching
bugs. In the existing "getbundle" world, your best bet to caching is
probably caching of the data that is inserted into the generated payload
(e.g. caching of revision fulltexts and deltas). Unfortunately, this means
the Mercurial server is still incurring a lot of load to assemble data and
send it out over the wire (this includes compression). Even with partial
clones potentially reducing server load dramatically due to having to
transfer less data, at certain scales, even this reduced load is highly
problematic. So, I think it is imperative to consider server scaling when
talking about partial clone and I think wholesale caching of entire command
responses is necessary in order to achieve it.

With this mindset, I started exploring a data retrieval command set
starting from first principles.

At its core, Mercurial is a content-indexed store. It isn't as generic as
Git (where every object is inserted into the same namespace). But it is
close. Mercurial segments content-indexed data by changesets,
manifest-trees, and files. (And if I had my way we would store metadata
like bookmarks and phases in a similar manner and then have a "repolog"
pointing to the list of head revisions and content-indexed bookmarks,
phases, data etc so we could view the state of a repo at any past point in
time.)

Instead of a monolithic "getbundle" command that retrieved data for all of
these things (plus metadata associated with changesets), what if we took a
remotefilelog approach and provided APIs for accessing individual pieces of
data? What if we had a command for accessing changeset data, a command for
accessing manifest data, and a command for accessing file data? e.g. what
if we had commands that accepted a list of explicit nodes or lists of base
and head nodes and returned data about the corresponding revisions. How
would that change things?

For starters, having such a set of commands is substantially more flexible
than where we are today with "getbundle." By giving clients granular access
to data, you empower clients to devise new ways of consuming that data. For
example, one could build a Subversion-like checkout feature (only fetch
data for a specific revision of the repository) without any new features on
the server! Given a changeset hash, you could fetch that changeset revision
using a "get changeset data" command, find its manifest revision, fetch
that manifest using a "get manifest data" command, then fetch corresponding
file revisions using a "get file data" command. Other tools may also wish
to leverage such APIs. For example, git-cinnabar (a Git extension that
allows Git to push and pull against Mercurial repositories by speaking the
Mercurial wire protocol) could have direct access to data (instead of going
through "getbundle"/bundle2) and this would make it easier to import
Mercurial data. (And probably more robust too because of issues like [2].)

Another benefit would be a simpler server. Having granular and well-defined
commands for accessing repository data would make it drastically simpler to
implement a server, including custom servers (like Mononoke). You wouldn't
need to implement the full spectrum of bundle2 and all its semantics via
"getbundle." You would essentially have a pile of data retrieval commands.
And, it would probably be relatively easy to plug non-revlog storage into
the server at that point.

And if done correctly, simple data retrieval commands with well-defined
semantics would lend themselves to aggressive caching. For example, a "get
revision data for file P at revisions [X, Y, Z]" can be cached almost
effortlessly, since file revision state is immutable (modulo censoring). It
would be possible to build pass-through caching of the entire command
response. This would eliminate a ton of server load and make servers vastly
easier to scale.

If we limit ourselves to simple data retrieval commands on the server, it
changes the "architecture" of clone/pull substantially. In the "getbundle"
world, the server is doing a lot of work. The first thing it does is figure
out what changeset revisions to send. Then it finds new manifests
associated with those changesets. Then it finds new file revisions
associated with those manifests. It accumulates all this state and streams
all that data. If all you have is simple data retrieval commands, most of
this work shifts to the client. This does have its advantages.

Again, an advantage is complexity is moved from server to client. This
keeps servers simple and easier to implement, debug, and scale.

One component that shifts to clients is link nodes. Link nodes (or linkrevs
since they are stored as an integer in revlogs) are pointers to the first
changeset that introduced a revision. They allow you to go from e.g. an
arbitrary file node to a changeset very efficiently. Because we index each
file separately and because each file revision has a pointer to a
changeset, we can look at the history of an individual file and map that
history back to changesets without having to scan all changesets or open
manifests. Link nodes have their own problems in the presence of hidden
changesets. But in the context of the wire protocol today, the server is
computing link nodes as part of emitting revision data. This model kind of
falls apart in a partial clone world because the server doesn't know what
changesets the client has. The client is in the best position to determine
what changeset a file revision should be linked to. Anyway, if you are only
using simple data retrieval commands, this problem of file node mapping
(and the corresponding problem of adjustment that arises when hidden
changesets are in play) can be fully shifted to the client: the client can
keep track of which changeset/manifest introduced a file revision as part
of its file node discovery process and set the link node accordingly.

Another component that largely shifts to clients is "narrow" logic. In the
narrow extension today, the client tells "getbundle" what file patterns are
relevant and what nodes it already has and the server has to do a lot of
work around determining what revisions to send. If all you have is
primitive data retrieval APIs, you would probably add a "path filter"
argument to the "get changeset data" command, retrieve the relevant
changesets, then incrementally retrieve manifests and files revisions until
you have all the data you need. This drastically reduces the server-side
complexity and cost of narrow.

Another problem that seemingly becomes simpler is large file storage. I
argue that largefiles and LFS today is effectively a hack to facilitate
non-partial clones despite the presence of large files. We store and
transfer flagged large files specially. But if your method of accessing
files data is through a dedicated "get file data" command, when you squint
hard enough you realize that this is logically very similar to "all files
are using largefiles/LFS." This leads to questions like "if we have a
dedicated 'get file data' API, why do we need a special store / endpoint
for large files?" And if we communicate the sizes of files before file data
is retrieved or don't transfer revision data over a size threshold unless
the client asks, this puts clients in the driver's seat about whether to
fetch large files revisions. We could implement all the benefits of
largefiles / LFS without it having to be a feature that repositories and
servers opt in to! i.e. clients could dynamically apply special storage
settings on large file revisions as they see fit.

But this architectural shift would have its disadvantages.

Assuming you only have primitive data retrieval commands, you are now
issuing a lot more commands. This introduces the potential for receiving
non-atomic state - a regression from "getbundle"/bundle2. This introduces
more round trips to the server, which could add significant overhead. If
sending thousands of command requests, this could contribute significant
overhead for both client and server. Your Mercurial server could be
processing >10,000x more commands than before relatively easily!

If clients must specify the nodes of all requested data, this requires
clients to transfer nodes up to the server. Many network connections have
limited upload bandwidth and such uploads could make data retrieval
extremely slow.

If clients need to scan manifests to find new file revisions so they can be
retrieved explicitly, this will add considerable client-side overhead.
(Today, changegroup generation cheats by using linkrevs to determine what
file revisions to send and this is considerably faster than reading and
walking manifests.)

While there are disadvantages to a completely primitive set of data
retrieval commands, having this set of commands (fetch changeset, manifest,
and file data) offers a host of benefits. If nothing else, merely having
the commands will foster client-side experimentation because pretty much
any data retrieval strategy can be derived from this set of primitives.

So, I will soon be sending patches that implement the new commands:
"changesetdata," "manifestdata," "filedata." These commands allow the
retrieval of data for individual changeset, manifest, and file revisions.
And "data" here is a very loose term. The commands are all designed such
that the client specifies exactly what "fields" to retrieve. Example fields
include "parents" and "revision" to fetch the parent nodes and revision
fulltext, respectively. This allows a client to request just the DAG/index
data or just the revision data. And on "changesetdata," the fields
"bookmarks" and "phases" are also recognized and result in the
corresponding data being attached to relevant changeset revisions. Allowing
the set of retrieved data to be dynamic introduces flexibility in clients.
Clients could e.g. retrieve and store index data for everything while
lazily fetching revision data on demand. We could also do things like
expose new data primitives easily. For example, "changesetdata" could grow
a "filechanges" field that returned a list of manifest mutations/diff in
that changeset. This could allow bypassing the need to transfer and store
manifest revisions explicitly. I believe this design to be similar to and
compatible with Mononoke's concept of "bonsai changesets."

I will also be sending patches that implement clone/pull using these new
commands.

While my initial experimentation with a totally overhauled set of commands
for facilitating clone/pull is very promising, it's only a start. The
simple commands as implemented are too simple and there's too much
overhead. Full clones are substantially slower and the client has to do a
lot of work and transfer a lot of data to the server. It is obvious we will
need to supplement these basic commands with either specialized commands or
special query modes. e.g. we likely want a way to request file revision
data for multiple files in a given changeset or manifest rather than having
to request the revision data for each file separately. At the end of the
day, the wire protocol command set will be driven by practical needs, not
by ivory tower architecting. We'll see what shortcuts we need to employ in
the name of performance and we'll implement them.

Let's talk a bit about performance.

In the 4.6 release cycle, I started implementing a new wire protocol
transport format. The overarching goal here was to devise an RPC protocol
that was consistent across transports (namely SSH and HTTP) and had
desirable scaling characteristics. The protocol is far from finished and
will likely change substantially before it is marked as non-experimental.
But it is already delivering on some of its promises with the new data
access commands I described above. For example, instead of issuing N HTTP
requests to invoke the "filedata" command N times, we can send 10,000 file
data requests in a single HTTP request. This drastically cuts down on
overhead. Any command using this wire protocol can be batched. Whereas the
existing wire protocol pushes us towards monolithic commands due to wire
protocol overhead, the new wire protocol allows us to have more, smaller
commands with minimal overhead.

One of the aces up my sleeve in the new wire protocol is support for
"content redirects" for any command. Essentially, it will be clone/pull
bundles built into the RPC protocol itself. The server will advertise a
list of potential redirect targets. When the client makes a request, it
will tell the server which redirect targets are appropriate. Then in the
course of processing a request, the server can send a response that
redirects the client to another location. For example, client A could make
a request for "all revision data for all files in changeset X." The server
will generate the response data for that request and simultaneously stream
it to both the client and to a blob store, say Amazon S3. A CDN is
configured to access that S3 bucket and the Mercurial server advertises the
CDN as a "redirect target." Client B comes along and makes the same request
for file data, advertising that the CDN is an appropriate "redirect
target." The Mercurial server sees that there is a cached response to this
command in S3 and it tells the client "fetch the response from this
CDN-hosted URL."

I plan to make aggressive caching and content redirects 1st class citizens
in the new RPC protocol and server implementation. I want it to be possible
to cache the results of commands by adding a one-liner to the Python
decorator declaring the wire protocol command. I want there to be a simple
caching interface so that extensions can implement their own caching
providers. I want server operators to be able to add "CDN acceleration" to
their Mercurial servers by activating an extension and adding <10 lines to
an hgrc file. Put another way, I want to make it as easy as possible to
scale Mercurial servers. I don't want to hear stories about companies
complaining how resource intensive running their Mercurial server is. If
the ideas I have are implemented, I'm pretty certain we'll be able to
deliver on that promise. (And, yes, I'm considering the needs of private
organizations who will want things like access control on their
cache/content store.)

The new wire protocol and proposed command set represents a massive change.
There is absolutely no backwards compatibility. I believe Kevin said
something like "the wire protocol defines the interchange format of a VCS
and therefore it *is* the VCS: so any new wire protocol is tantamount to
inventing a new VCS." There is truth to that statement. And I fully
recognize that this work could be characterized as inventing a new VCS. It
will be the first new VCS that Mercurial invented since bundle2 :) But,
having spent a lot of time thinking about the wire protocol, it is obvious
to me that the existing wire protocol is a liability to the future of the
project. I postulate that if we had a well-designed wire protocol with
flexible data retrieval commands, partial clone would have shipped years
ago. As it stands, I think we've incurred years of people time devising
partial and somewhat hacky solutions that work around limitations in the
existing wire protocol and command set and the architecture it forces us to
have. I believe a new wire protocol and command set will alleviate most of
these road blocks and allow us to have much nicer things.

Since we are effectively talking about a new VCS at the wire protocol
level, let's talk about other crazy ideas. As Augie likes to say, once we
decide to incur a backwards compatibility break, we can drive a truck
through it.

Let's talk about hashes.

Mercurial uses SHA-1 for content indexing. We know we want to transition
off of SHA-1 eventually due to security weaknesses. One of the areas
affected by that is the wire protocol. Changegroups use a fixed-width 20
byte field to hold node values. That means we need to incur some kind of BC
break in order to not use SHA-1 over the wire protocol. That's either
truncating a longer hashing algorithm output to 20 bytes or expanding the
fixed-width field to accommodate a different hash (likely 32 bytes). Either
way, it requires a BC break because old clients would barf if they saw data
with the new format.

In addition, Mercurial has 2 ways to store manifests: flat and tree.
Unfortunately, any given repository can only use a single manifest type at
a time. If you switch manifest formats, you change the manifest node
referenced in the changeset and that changes the changeset hash.

The traditional way we've thought about this problem is incurring some kind
of flag day. A server/repo operator makes the decision to one day
transition to a new format that hashes differently. Clients start pulling
the new data for all new revisions. Every time we talk about this, we get
uncomfortable because it is a painful transition to inflict.

I think we can do better.

One of the ideas I'm exploring in the new wire protocol is the idea of
"hash namespaces." Essentially, the server's capabilities will advertise
which hash flavors are supported. Example hash flavors could be
"hg-sha1-flat" for flat manifests using SHA-1 and "hg-blake2b-tree" for
tree manifests using blake2b. When a client makes a request, that request
will be associated with a "hash namespace" such that any nodes referenced
by that command are in the requested "hash namespace."

This feature, if implemented, would allow a server/repository to index and
serve data under multiple hashing methodologies simultaneously. For
example, pushes to the repository would be indexed under SHA-1 flat, SHA-1
tree, blake2b flat, and blake2b tree. Assuming the server operator opts
into this feature, new clones would use whatever format is
supported/recommended at that time. Existing clones would continue to
receive SHA-1 flat manifests. New clones would receive blake2b tree
manifests. No forced transition flag day would be required. Server
operators could choose to keep around support for legacy formats for as
long as they deemed necessary. And the "changesetdata" command I'm
proposing could allow querying the hashes for other namespaces, allowing
clients to map between hashes.

I think "hash namespaces" are important because they provide future
compatibility against any format changes. We already have an example of a
hash algorithm change (SHA-1) and a data format change (flat versus tree
manifests). But there are other future changes we may not know of. For
example, we may decide to change how files are hashed so copy metadata
isn't part of the hash. Or we may choose to express manifest diffs as part
of the changeset object and do away with manifests as a content-indexed
primitive. These would all necessitate a new "hash namespace" and I think
having the flexibility to experiment with new formats and hashing
techniques will ultimately be good for the long-term health of Mercurial.

There's also a potentially killer feature that could be derived from "hash
namespaces:" Git integration. We know that it is possible to perform
bi-directional conversions between Mercurial and Git. One could envision a
"hash namespace" that stores Git hashes. When a push comes in, we could
compute the Git hashes for its files (blobs), manifests (trees), and
changesets (commits). Using the low-level "changesetdata," "manifestdata,"
and "filedata" commands, you could request revision data by Git hash. Or
you could request the Git hash from a Mercurial hash or vice-versa. From
here, you could build a Git client that speaks the Mercurial wire protocol
to access the Git-indexed data. (I imagine git-cinnabar would do this so it
doesn't have to perform expensive hash conversion and tracking on the
client.) And because Mercurial's wire protocol will have things like
"content redirects" built-in, you will get scaling out-of-the-box. In other
words, we can make the Mercurial server a pseudo-Git server by exposing the
Git-indexed data via Mercurial's wire protocol commands. Of course, if you
have Git hashes for revision data, it should be possible to run the actual
Git wire protocol server. Either of these features would go a long way
towards ending the Mercurial vs Git holy war for server operators: we tell
people to run a Mercurial server that maintains a Git index of the data and
call it a day.

So where are we today and where is this going?

We have the basis of a new wire protocol transport in core Mercurial. It
still needs a lot of love and will undergo several BC breaks before it
ships as non-experimental. But that's fine for an experimental feature. The
editor for the HTTP/2 specification has offered to provide a spec review
when the time comes and I fully intend on taking him up on that before we
promote the protocol to non-experimental.

The client/peer interface is in a pretty good state and we can issue
commands and handle responses for the new protocol over HTTP. It may not do
things optimally under the hood. But it works and is usable enough that we
can start calling into wireproto v2-only commands.

I have a handful of patches queued up to remove a bunch of warts/bugs with
the existing wire protocol version 2 code. I'll start sending those soon.

I also have a handful of patches queued up to implement new wire protocol
commands "changesetdata," "manifestdata," and "filedata." These commands
aren't complete. But they are enough to implement clone/pull without
"getbundle"/bundle2. Regardless of the final set of commands we need in
order to support efficient clones (we may even port "getbundle" to wire
protocol version 2), I'd like to get these primitive commands landed
because all clone/pull strategies should be implementable in terms of them
and they will make very useful arrows in our quiver.

I have designs and some preliminary code for robust caching and content
redirection on the server. I'm pretty confident in stating that it will
work. And I'm committed to making it work, as Mozilla will want to leverage
this feature.

I have ongoing work around formalizing everything related to repository
storage. I want to formalize interfaces for accessing the storage
primitives. The goal here is to make it possible to implement non-revlog
repository storage. There are benefits to both clients and servers for this
work. On servers, I'd like it to be possible to use e.g. generic key-value
stores for storage so we don't rely on local filesystems. On clients, I'd
like to experiment with alternate storage that doesn't require writing so
many files. This will help with clone times, especially on Windows. I think
SQLite is a good place to start. But I'm open to alternatives.

My goal is for 4.8 to ship a version of partial clone that we can use on
hg.mozilla.org on our existing infrastructure. This means no substantial
increase in server load. Since we currently offload ~97% of bytes via clone
bundles, I'm guessing this is going to be difficult to impossible without
transparent command caching. And I don't think we can have that with
"getbundle" because that command is too complicated. So I really want to
land the new commands for data access and have a mechanism in core for
doing a partial clone with them. I would also like to land an experimental
client storage backend that doesn't require per-file revlogs for file
storage. All of these things can be experimental and not subject to BC: I'm
willing to deal with that pain on Mozilla's end until things stabilize
upstream. I don't expect any of this work will stabilize before a release
or 2 into 2019 anyway.

If you want to help, there are tons of ways to do that.

Foremost, if you have feedback about this post, say something! I'm
proposing some radical things. People should question changes that are this
radical! I think I've demonstrated or will demonstrate some significant
value to this work. But just because you can do a thing doesn't mean you
must do a thing.

There is no shortage of work around adding interfaces to storage and
refactoring storage APIs so they aren't revlog specific. There are entire
features like bundlerepo, unionrepo, repair, and repo upgrading that make
heavy assumptions about the existence of revlogs and current file formats.
Auditing the existing interfaces in repository.py and removing things that
don't belong would also be a good use of time. While I've been focused on
the revlog primitives so far, we will also need to add interfaces for
everything that writes to .hg/. e.g. bookmarks, phases, locks, and
transactions. We need to figure out a way to make these things code to an
interface so implementation details of the existing .hg/ storage format
don't bleed out into random callers. The tests/simplestorerepo.py extension
implements things with custom storage and running the tests with that
extension flushes out places in code that make assumptions about how
storage works.

Once the new wire protocol commands and exchange code lands, I'll need help
adding features to support partial clone. There are still some parts I
don't fully grok, such as ellipsis nodes and widening/narrowing. I /think/
my radical shift of pull logic from server to client makes these problems
more tractable. But I don't understand the space as well as others.

If you have a wish list for other features to add to the wire protocol, now
would be the time to say something.

When the time comes, I think it would be rad to experiment with the
multiple hash storage ideas I outlined above. I'd be particularly
interested in multi-storage of flat and tree manifests as well as Git
indexing of revisions. Both features would be very useful for Mozilla.

Whew. That was a long email. And I didn't even say everything I could have
said on the subject. If you made it here, congratulations. Hopefully you
now have a better understanding of the work I'm doing and where I hope this
all leads. If you want to help, ping me here or on IRC (I'm indygreg) and
we'll figure something out.

Gregory

[1]
https://gregoryszorc.com/blog/2018/07/27/benefits-of-clone-offload-on-version-control-hosting/
[2] https://github.com/glandium/git-cinnabar/issues/192
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180831/56b52a89/attachment.html>