Manifest Refactor

Durham Goode durham at fb.com
Wed Jul 13 16:19:59 EDT 2016


On 7/12/16 7:08 PM, Gregory Szorc wrote:
> On Tue, Jul 12, 2016 at 3:03 PM, Durham Goode <durham at fb.com 
> <mailto:durham at fb.com>> wrote:
>
>     We'll be looking at moving to tree manifests as our source of
>     truth over the next few months, and one problem area is the fact
>     that the manifest class is not well factored for this usecase.
>     This one class is the collection of all manifests, the accessor
>     for information about individual manifests, and the storage format
>     (revlog).
>
>     Before I do a bunch of work, I wanted to run my proposal for
>     breaking up the manifest by you guys:
>
>     1. Add a "manifestlog" class that represents the collection of all
>     root-level manifests (i.e. what commits point to; not any
>     sub-trees)  It's basically what repo.manifest would return, and
>     mainly consist of "get" and "add" apis that return and accept
>     manifest instances. It would be responsible for caching recently
>     used manifests, and potentially serving up the right kind of
>     manifest when demanded (ex: during our transition from flat
>     manifests to tree manifests, we may want to allow loading both,
>     and this class would multiplex them). It would have no, or very
>     little, knowledge about revlogs/storage.
>
>     2. Make the "manifest" class represent a single instance of a
>     manifest (it would point at other instances of "manifest" for
>     sub-trees).  From a consumers point of view, when they do
>     'repo.manifest.get(node)' they will receive a manifest instance
>     and they should be able to not care how it's implemented.  It
>     would expose apis like 'children', 'walk', 'get(fileordirname)',
>     'parents', 'linkrev', etc.
>
>     The specific implementation of the manifest instance could use
>     whatever storage scheme it wants.  For example, in the normal
>     vanilla manifest, it would look much like manifestdict does today,
>     with no knowledge of revlogs (you just pass text to the
>     constructor). In a tree world, each instance in the tree could
>     have knowledge of its own backing revlog, and be able to construct
>     new instances as someone recurses down.
>
>     3. Add a "manifestrevlog" class that inherits from revlog. This is
>     the actual ondisk storage.  Ideally "manifest" instances would
>     just call simple read and write apis (and not depend on revlog
>     implementation details), so we could in theory replace the revlog
>     storage with something else (packed revlogs, lookaside to
>     memcache, whatever) without having to rewrite the actual manifest
>     business logic.
>
>
>     Breaking the manifest into these three parts (collection,
>     instance, storage) should make it easier to mix and match manifest
>     implementations and storage schemes, without rewriting lots of logic.
>
>     For thing that do take heavy dependencies on it being a revlog
>     (like push/pull/changegroup), they will be able to reach around
>     the abstractions and talk directly to the revlog when necessary.
>     And future storage implementations will either have to do the same
>     or find a common API that can allow changegroups to be
>     created/received for both storages.
>
>
>     Thoughts? Concerns? Is renaming the collection class (which is the
>     primary interface for how the rest of mercurial interacts with the
>     manifest) from manifest to manifestlog a bad idea?  I could rename
>     the instance concept to manifestctx or something instead.
>
>
> I still owe a detailed reply but first...
>
> Does this change anything with regards to the availability of narrow 
> clone in the official repo/distribution? Or is the plan to still rely 
> on Google's out-of-repo narrowhg extension? Also, how does this work 
> relate to Facebook's "fastmanifest" and "cfastmanifest" extensions? 
> (I've seen lots of activity on these extensions and am curious what 
> the purpose and plans for them are. There seems to be some cool 
> experiments going on!)
If anything, this would make it easier to get narrowhg into upstream, 
since some of its integration points would become more isolated.

As for how it interacts with facebook's fastmanifest stuff, it would 
make that cleaner as well. fastmanifest is an extension that keeps a 
cache of certain manifest entries in tree form, then allows using the 
tree version for reads when possible.  It greatly speeds up things like 
diff in our large repo.  cfastmanifest is a c implementation of it (with 
a different ondisk storage format).

Our plan is to migrate our users from flat text to tree's in stages, by 
making the cache progressively more persistent and more used until flat 
manifests aren't read on the client anymore.  Then we'll do the same 
thing on the server.  The final form will likely be a custom storage 
format similar to remotefilelog's new pack file implementation (revlogs 
have issues for what we're planning, like inode overhead and the 
inability to add to them non-chronologically), and by doing this 
refactor we'll be able to reuse as much of the existing infra in core as 
we can.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160713/4da66744/attachment.html>


More information about the Mercurial-devel mailing list