Wire protocol futures

Thu Sep 13 18:36:00 EDT 2018

On Thu, Sep 13, 2018 at 2:59 PM Augie Fackler <raf at durin42.com> wrote:

> I have one specific question (search for "QUESTION" below) that might
> result in a short-term action item for me, rest is mostly commentary I
> think you could anticipate from me. :)
>
> On Aug 31, 2018, at 18:47, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>
>
> [...]
>
> The new wire protocol and proposed command set represents a massive
> change. There is absolutely no backwards compatibility. I believe Kevin
> said something like "the wire protocol defines the interchange format of a
> VCS and therefore it *is* the VCS: so any new wire protocol is tantamount
> to inventing a new VCS." There is truth to that statement. And I fully
> recognize that this work could be characterized as inventing a new VCS. It
> will be the first new VCS that Mercurial invented since bundle2 :) But,
> having spent a lot of time thinking about the wire protocol, it is obvious
> to me that the existing wire protocol is a liability to the future of the
> project. I postulate that if we had a well-designed wire protocol with
> flexible data retrieval commands, partial clone would have shipped years
> ago. As it stands, I think we've incurred years of people time devising
> partial and somewhat hacky solutions that work around limitations in the
> existing wire protocol and command set and the architecture it forces us to
> have. I believe a new wire protocol and command set will alleviate most of
> these road blocks and allow us to have much nicer things.
>
> Since we are effectively talking about a new VCS at the wire protocol
> level, let's talk about other crazy ideas. As Augie likes to say, once we
> decide to incur a backwards compatibility break, we can drive a truck
> through it.
>
>
> +1
>
> Let's talk about hashes.
>
> Mercurial uses SHA-1 for content indexing. We know we want to transition
> off of SHA-1 eventually due to security weaknesses. One of the areas
> affected by that is the wire protocol. Changegroups use a fixed-width 20
> byte field to hold node values. That means we need to incur some kind of BC
> break in order to not use SHA-1 over the wire protocol. That's either
> truncating a longer hashing algorithm output to 20 bytes or expanding the
> fixed-width field to accommodate a different hash (likely 32 bytes). Either
> way, it requires a BC break because old clients would barf if they saw data
> with the new format.
>
> In addition, Mercurial has 2 ways to store manifests: flat and tree.
> Unfortunately, any given repository can only use a single manifest type at
> a time. If you switch manifest formats, you change the manifest node
> referenced in the changeset and that changes the changeset hash.
>
> The traditional way we've thought about this problem is incurring some
> kind of flag day. A server/repo operator makes the decision to one day
> transition to a new format that hashes differently. Clients start pulling
> the new data for all new revisions. Every time we talk about this, we get
> uncomfortable because it is a painful transition to inflict.
>
> I think we can do better.
>
> One of the ideas I'm exploring in the new wire protocol is the idea of
> "hash namespaces." Essentially, the server's capabilities will advertise
> which hash flavors are supported. Example hash flavors could be
> "hg-sha1-flat" for flat manifests using SHA-1 and "hg-blake2b-tree" for
> tree manifests using blake2b. When a client makes a request, that request
> will be associated with a "hash namespace" such that any nodes referenced
> by that command are in the requested "hash namespace."
>
> This feature, if implemented, would allow a server/repository to index and
> serve data under multiple hashing methodologies simultaneously. For
> example, pushes to the repository would be indexed under SHA-1 flat, SHA-1
> tree, blake2b flat, and blake2b tree. Assuming the server operator opts
> into this feature, new clones would use whatever format is
> supported/recommended at that time. Existing clones would continue to
> receive SHA-1 flat manifests. New clones would receive blake2b tree
> manifests. No forced transition flag day would be required. Server
> operators could choose to keep around support for legacy formats for as
> long as they deemed necessary. And the "changesetdata" command I'm
> proposing could allow querying the hashes for other namespaces, allowing
> clients to map between hashes.
>
> I think "hash namespaces" are important because they provide future
> compatibility against any format changes. We already have an example of a
> hash algorithm change (SHA-1) and a data format change (flat versus tree
> manifests). But there are other future changes we may not know of. For
> example, we may decide to change how files are hashed so copy metadata
> isn't part of the hash. Or we may choose to express manifest diffs as part
> of the changeset object and do away with manifests as a content-indexed
> primitive. These would all necessitate a new "hash namespace" and I think
> having the flexibility to experiment with new formats and hashing
> techniques will ultimately be good for the long-term health of Mercurial.
>
> There's also a potentially killer feature that could be derived from "hash
> namespaces:" Git integration. We know that it is possible to perform
> bi-directional conversions between Mercurial and Git. One could envision a
> "hash namespace" that stores Git hashes. When a push comes in, we could
> compute the Git hashes for its files (blobs), manifests (trees), and
> changesets (commits). Using the low-level "changesetdata," "manifestdata,"
> and "filedata" commands, you could request revision data by Git hash. Or
> you could request the Git hash from a Mercurial hash or vice-versa. From
> here, you could build a Git client that speaks the Mercurial wire protocol
> to access the Git-indexed data. (I imagine git-cinnabar would do this so it
> doesn't have to perform expensive hash conversion and tracking on the
> client.) And because Mercurial's wire protocol will have things like
> "content redirects" built-in, you will get scaling out-of-the-box. In other
> words, we can make the Mercurial server a pseudo-Git server by exposing the
> Git-indexed data via Mercurial's wire protocol commands. Of course, if you
> have Git hashes for revision data, it should be possible to run the actual
> Git wire protocol server. Either of these features would go a long way
> towards ending the Mercurial vs Git holy war for server operators: we tell
> people to run a Mercurial server that maintains a Git index of the data and
> call it a day.
>
>
> Hash namespaces are very similar (at least at a high level) to an approach
> taken in Veracity that allowed multiple hash algorithms to live
> side-by-side. I don't remember the details there, but it sounded painful.
> I'm not saying we shouldn't do this, just that it's likely to be rough.
> Bonsai changesets do seem like they help, to an extent.
>
> I agree that we should at least reserve space for new hash(es) in the new
> format.
>
> [...]
>
> I have ongoing work around formalizing everything related to repository
> storage. I want to formalize interfaces for accessing the storage
> primitives. The goal here is to make it possible to implement non-revlog
> repository storage. There are benefits to both clients and servers for this
> work. On servers, I'd like it to be possible to use e.g. generic key-value
> stores for storage so we don't rely on local filesystems. On clients, I'd
> like to experiment with alternate storage that doesn't require writing so
> many files. This will help with clone times, especially on Windows. I think
> SQLite is a good place to start. But I'm open to alternatives.
>
> My goal is for 4.8 to ship a version of partial clone that we can use on
> hg.mozilla.org on our existing infrastructure. This means no substantial
> increase in server load. Since we currently offload ~97% of bytes via clone
> bundles, I'm guessing this is going to be difficult to impossible without
> transparent command caching. And I don't think we can have that with
> "getbundle" because that command is too complicated. So I really want to
> land the new commands for data access and have a mechanism in core for
> doing a partial clone with them. I would also like to land an experimental
> client storage backend that doesn't require per-file revlogs for file
> storage. All of these things can be experimental and not subject to BC: I'm
> willing to deal with that pain on Mozilla's end until things stabilize
> upstream. I don't expect any of this work will stabilize before a release
> or 2 into 2019 anyway.
>
> If you want to help, there are tons of ways to do that.
>
>
> I've mentioned this privately, but want to state it on the list: I'm now
> feeling enough pain on remotefilelog upgrades that I want to figure out
> what the minimal viable remotefilelog looks like for us. The hard
> constraints I know about for us:
>
> 1) lazy-fetch file contents
> 2) periodically build an efficient pack of loose files (but not too often,
> because it'll make some of our storage layers upset: I asked if we could
> just dump blobs in a sqlite database and that'll be really bad for us, so
> some bespoke-ish data-packing mechanism is going to be a must, sadly)
> 3) push works and includes everything in the bundle
> 4) viable migration path from existing remotefilelog (doesn't have to be
> in core, but has to be _doable_)
>
> My strongly preferred approach to this would be to essentially fork and
> rewrite-in-place the existing remotefilelog codebase, with an eye towards
> being able to land it as extra-experimental. Much of what I've seen in
> there can be made less invasive these days, either through more targeted
> extensions.wrapfunction() use or minor tweaks to core. A lot of the
> confusing bits appear to be layered cooperative hacks for FB's treemanifest
> migration code or fastannotate or something or cruft that's hanging around
> to support older versions of hg.
>
> QUESTION: I know you're favoring more of a "big bang" approach to a
> partial clone tool. Do you envision lazy-fetching files as something that's
> of use to you, and if so would it be plausibly productive for me to try and
> produce the "cleaned up remotefilelog pseudo-fork" I describe, at least in
> part? I could time box it at a day (or half a day) so it's not a ton of
> time investment, but I think that'd be enough to give you an idea of what
> the result might look like and maybe convince you the incremental approach
> can arrive at your desired goal while simultaneously making my life easier.
> ;)
>

I think lazy fetching files is useful. However, some care needs to be taken
to devise how this is done. The new wire protocol commands to facilitate
granular data access make lazy fetching totally doable. It would be
conceivable to create a storage backend whose data access methods incurred
network I/O transparently. However, I'm pretty confident performance would
be horrible. In order to achieve reasonable performance, we'd need to batch
requests. And the storage APIs used by most internal operations operate on
single revisions. We'd need to refactor a lot of code to express intents to
access data for multiple revisions so that we could easily intercept these
requests and turn them into a batched network request. I think this is
doable. But I haven't thought a lot about it and I think it would be a bit
of work to find all the places in code where we access data for N revisions
and convert that to a "batch get" API call against the storage layer.

As for whether a cleaned up remotefilelog would be useful, I think we
should chat about this in VC, as it is complicated. I'd also like to learn
about your client-side storage requirements. (I'd *really* like to say I
can give you a storage backend that isn't so strongly coupled to
remotefilelog's packfiles. I just don't know what that needs to look like.)

>
> Foremost, if you have feedback about this post, say something! I'm
> proposing some radical things. People should question changes that are this
> radical! I think I've demonstrated or will demonstrate some significant
> value to this work. But just because you can do a thing doesn't mean you
> must do a thing.
>
> There is no shortage of work around adding interfaces to storage and
> refactoring storage APIs so they aren't revlog specific. There are entire
> features like bundlerepo, unionrepo, repair, and repo upgrading that make
> heavy assumptions about the existence of revlogs and current file formats.
> Auditing the existing interfaces in repository.py and removing things that
> don't belong would also be a good use of time. While I've been focused on
> the revlog primitives so far, we will also need to add interfaces for
> everything that writes to .hg/. e.g. bookmarks, phases, locks, and
> transactions. We need to figure out a way to make these things code to an
> interface so implementation details of the existing .hg/ storage format
> don't bleed out into random callers. The tests/simplestorerepo.py extension
> implements things with custom storage and running the tests with that
> extension flushes out places in code that make assumptions about how
> storage works.
>
>
> One thing I've noticed as I've been doing this remotefilelog upgrade is
> that revnums appear in too many places. Architecturally I think we can't
> avoid having it on the top-level changelog, but I think for lower layers we
> can probably hide it without too much loss of efficiency.
>
> Once the new wire protocol commands and exchange code lands, I'll need
> help adding features to support partial clone. There are still some parts I
> don't fully grok, such as ellipsis nodes and widening/narrowing. I /think/
> my radical shift of pull logic from server to client makes these problems
> more tractable. But I don't understand the space as well as others.
>
> If you have a wish list for other features to add to the wire protocol,
> now would be the time to say something.
>
> When the time comes, I think it would be rad to experiment with the
> multiple hash storage ideas I outlined above. I'd be particularly
> interested in multi-storage of flat and tree manifests as well as Git
> indexing of revisions. Both features would be very useful for Mozilla.
>
> Whew. That was a long email. And I didn't even say everything I could have
> said on the subject. If you made it here, congratulations. Hopefully you
> now have a better understanding of the work I'm doing and where I hope this
> all leads. If you want to help, ping me here or on IRC (I'm indygreg) and
> we'll figure something out.
>
>
>
>
>
>
> Gregory
>
> [1]
> https://gregoryszorc.com/blog/2018/07/27/benefits-of-clone-offload-on-version-control-hosting/
> [2] https://github.com/glandium/git-cinnabar/issues/192
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180913/10f7a038/attachment.html>