Manifest Refactor

Wed Jul 13 12:13:04 EDT 2016

> On Jul 12, 2016, at 18:03, Durham Goode <durham at fb.com> wrote:
> 
> We'll be looking at moving to tree manifests as our source of truth over the next few months, and one problem area is the fact that the manifest class is not well factored for this usecase. This one class is the collection of all manifests, the accessor for information about individual manifests, and the storage format (revlog).
> 
> Before I do a bunch of work, I wanted to run my proposal for breaking up the manifest by you guys:
> 
> 1. Add a "manifestlog" class that represents the collection of all root-level manifests (i.e. what commits point to; not any sub-trees)  It's basically what repo.manifest would return, and mainly consist of "get" and "add" apis that return and accept manifest instances.  It would be responsible for caching recently used manifests, and potentially serving up the right kind of manifest when demanded (ex: during our transition from flat manifests to tree manifests, we may want to allow loading both, and this class would multiplex them). It would have no, or very little, knowledge about revlogs/storage.

Possible wrinkle here: some code paths really only care about what changed relative to p1 in the manifest, and we currently have an optimization where we drop everything but the data to insert in the manifest and then parse as a manifest. My understanding is that this is a pretty big win for performance, so some sort of optionally-fast-path read-only-new-entries method is probably also going to be a requirement to avoid regressing the normal-people case.

> 
> 2. Make the "manifest" class represent a single instance of a manifest (it would point at other instances of "manifest" for sub-trees).  From a consumers point of view, when they do 'repo.manifest.get(node)' they will receive a manifest instance and they should be able to not care how it's implemented.  It would expose apis like 'children', 'walk', 'get(fileordirname)', 'parents', 'linkrev', etc.
> 
> The specific implementation of the manifest instance could use whatever storage scheme it wants.  For example, in the normal vanilla manifest, it would look much like manifestdict does today, with no knowledge of revlogs (you just pass text to the constructor). In a tree world, each instance in the tree could have knowledge of its own backing revlog, and be able to construct new instances as someone recurses down.

Yep, sounds very much like what I had in mind.

> 3. Add a "manifestrevlog" class that inherits from revlog. This is the actual ondisk storage.  Ideally "manifest" instances would just call simple read and write apis (and not depend on revlog implementation details), so we could in theory replace the revlog storage with something else (packed revlogs, lookaside to memcache, whatever) without having to rewrite the actual manifest business logic.

Yes! See above caveat about the readdelta madness, but I think that's easily overcome (IIRC the contract for readdelta already allows it to return too much data, so non-revlog that can't do the delta computation quickly can just return a full manifest or whatever.

> Breaking the manifest into these three parts (collection, instance, storage) should make it easier to mix and match manifest implementations and storage schemes, without rewriting lots of logic.
> 
> For thing that do take heavy dependencies on it being a revlog (like push/pull/changegroup), they will be able to reach around the abstractions and talk directly to the revlog when necessary. And future storage implementations will either have to do the same or find a common API that can allow changegroups to be created/received for both storages.
> 
> 
> Thoughts? Concerns? Is renaming the collection class (which is the primary interface for how the rest of mercurial interacts with the manifest) from manifest to manifestlog a bad idea?  I could rename the instance concept to manifestctx or something instead.

I think this all sounds sane, and we can probably even find a coherent API for readdelta with a little thought. Happy hacking!

(Definitely wait for Martin to respond before getting too deep into this though, because he often catches stuff I miss in this area.)