[PATCH 1 of 5] manifest: break mancache into two caches

Durham Goode durham at fb.com
Mon Aug 8 21:28:53 EDT 2016


On 8/8/16 6:17 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham at fb.com>
> # Date 1470696646 25200
> #      Mon Aug 08 15:50:46 2016 -0700
> # Node ID f91cdd4315bbc92ad893c8084c0347c218399ce3
> # Parent  37b6f0ec6241a62de90737409458cd622e2fac0d
> manifest: break mancache into two caches
>
This is the beginning of a 30 patch series that refactors the current 
manifest class. It splits the concept of a manifest collection from the 
concept of a manifest instance and from the concept of a particular 
storage (like revlogs).  The result looks very much like our changelog 
today (changelog is a collection of commits, changectx is an individual 
commit), except we've also separated the storage as well.

The overall refactor adds the new classes incrementally, then begins 
moving functionality out of manifest on to the new classes, and changing 
call sites as we go.  At the very end the current manifest class will be 
deleted.

The final code can be inspected here:
https://bitbucket.org/DurhamG/hg/src/73fb2514d89400e290684e635a4f592017b4ad08/mercurial/manifest.py?at=manifestrefactor&fileviewer=file-view-default#manifest.py-954

(as can the entire series)

I'll be getting perf numbers before we get far enough into the series 
for it to be important.

The high level design ends up looking like so:

class manifestlog(object):
     """A collection class representing the collection of manifest snapshots
     referenced by commits in the repository.

     In this situation, 'manifest' refers to the abstract concept of a 
snapshot
     of the list of files in the given commit. Consumers of the output 
of this
     class do not care about the implementation details of the actual 
manifests
     they receive (i.e. tree or flat or lazily loaded, etc)."""

     def __init__(self, opener, revlog)
     def __getitem__(self, node): return self.get(node)
     def get(self, node, dir='')
     def add(self, m, transaction, link, p1, p2...): return 
m.write(transaction, ...)

class manifestrevlog(revlog):
     """A revlog that stores manifest texts. This is responsible for 
caching the
     full-text manifest contents.
     """

     def dirlog(self, dir)

class manifestctx(manifestdict):
     """A class representing a single revision of a manifest, including its
     contents, its parent revs, and its linkrev."""

     def new(self)
     def node(self)
     def p1(self)
     def p2(self)
     def linkrev(self)
     def readfast(self, shallow=False)
     def readdelta(self, shallow=False)

class treemanifestctx(treemanifest)
     """Same as manifestctx, but is backed by tree storage instead of 
flat storage."""

     <same as manifestctx>

class memmanifestctx(manifestdict):
     """In memory representation of a pending manifestctx. Has a write 
function
     that will serialize the pending manifest to storage."""

     def new(self)
     def write(self, transaction, link, p1, p2, ....)

class memtreemanifestctx(treemanifest):
     """Same as memmanifestctx, except it is also aware of recursive 
serializing
     trees to storage."""

     <same as memmanifestctx>
     def _addtree(...)


More information about the Mercurial-devel mailing list