Manifest Refactor

Thu Jul 14 19:54:00 EDT 2016

On Thu, Jul 14, 2016 at 4:35 PM, Durham Goode <durham at fb.com> wrote:
>
>
> On 7/14/16 3:38 PM, Martin von Zweigbergk wrote:
>>
>> On Tue, Jul 12, 2016 at 3:03 PM, Durham Goode <durham at fb.com> wrote:
>>>
>>> We'll be looking at moving to tree manifests as our source of truth over
>>> the
>>> next few months, and one problem area is the fact that the manifest class
>>> is
>>> not well factored for this usecase. This one class is the collection of
>>> all
>>> manifests, the accessor for information about individual manifests, and
>>> the
>>> storage format (revlog).
>>>
>>> Before I do a bunch of work, I wanted to run my proposal for breaking up
>>> the
>>> manifest by you guys:
>>>
>>> 1. Add a "manifestlog" class that represents the collection of all
>>> root-level manifests (i.e. what commits point to; not any sub-trees)
>>> It's
>>> basically what repo.manifest would return, and mainly consist of "get"
>>> and
>>> "add" apis that return and accept manifest instances.
>>
>> As Augie said elsewhere in this thread, there's more than "get"
>> (actually called "read") and "add" in the current API. Where would you
>> put the dirlog() function that's currently on manifest? It may be
>> useful to include a list of the classes you imagine and their methods
>> (but skip the uninteresting ones), so we can see clearly where each
>> method goes.
>
> I'm aware of readdelta and dirlog.  readdelta would be moved onto the
> manifest instance (since you are asking the specific instance a piece of
> information about itself).  This would mean we don't have to have if
> conditions all over the manifestlog class special casing tree vs not-tree.

The manifest instances are currently created by read() and readdelta()
etc. I think you're proposing that there would be a single
manifestlog.read(node) that returns a manifest on which you can then
call delta() on to get a delta. So e.g. manifestlog.read(node).keys()
gives you all the files in the manifest, while
manifestlog.read(node).delta().keys() gives you the files in the
delta. Am I understanding that right? I think that means you're also
proposing that the manifest class will lazily read the full or delta
(otherwise there's not much point in asking for the delta if you have
to read the full version too). Correct?

>
> I'm still getting familiar with dirlog and how it works, but I think think
> dirlog would move on to the manifestrevlog class, since it's an
> implementation detail of the way tree manifests serialize themselves.  The
> manifestlog itself would not know about it, and generally the Mercurial
> consumers shouldn't be aware of the fact that the manifest is serialized to
> separate revlogs.  Any code that does access dirlog (like changegroup), are
> things that are specifically revlog aware and will bypass the abstraction
> for now.

Makes sense, I think.

>
>>
>> I'm very much for renaming manifest to manifestlog. I've meant to do
>> that for a long time myself. It will of course mean some work for
>> extensions, but that's life.
>>
>>> It would be
>>> responsible for caching recently used manifests, and potentially serving
>>> up
>>> the right kind of manifest when demanded (ex: during our transition from
>>> flat manifests to tree manifests, we may want to allow loading both, and
>>> this class would multiplex them). It would have no, or very little,
>>> knowledge about revlogs/storage.
>>>
>>> 2. Make the "manifest" class represent a single instance of a manifest
>>> (it
>>> would point at other instances of "manifest" for sub-trees).  From a
>>> consumers point of view, when they do 'repo.manifest.get(node)' they will
>>> receive a manifest instance and they should be able to not care how it's
>>> implemented.  It would expose apis like 'children', 'walk',
>>> 'get(fileordirname)', 'parents', 'linkrev', etc.
>>>
>>> The specific implementation of the manifest instance could use whatever
>>> storage scheme it wants.  For example, in the normal vanilla manifest, it
>>> would look much like manifestdict does today, with no knowledge of
>>> revlogs
>>> (you just pass text to the constructor). In a tree world, each instance
>>> in
>>> the tree could have knowledge of its own backing revlog, and be able to
>>> construct new instances as someone recurses down.
>>
>> That sounds very much like it already is. Do you mean to simply define
>> a common super class for manifestdict and treemanifest? I think that
>> super class would be pretty empty (but e.g. "def keys(self): return
>> list(self.iterkeys())" could live there). It's what statically typed
>> languages would do, of course. I'm not familiar enough with Python to
>> say what's more Pythonic.
>
> The current manifestdict and treemanifest classes only represent the content
> of the manifest.  They don't actually represent the manifest revision itself
> (i.e. they don't have the node, p1, p2, linkrev, or any knowledge about
> storage).  My new 'manifestctx' and 'treemanifestctx' classes (which might
> eventually be renamed just 'manifest' and 'treemanifest') inherit from
> manifestdict and treemanifest, and add the revlog, node, p1, p2, and
> linkrev.  That way we can remove node, p1, p2, linkrev from manifestlog
> (since they are attributes of the instance, not the collection).  This would
> allow instances of different type manifests to read their metadata (p1, p2,
> linkrev) from different sources without having if conditions all over the
> manifestlog.

Oh, I see. treemanifest already does know its nodeid. I think it may
make sense to add the other things you're suggesting.