Making chg stateful

Jun Wu quark at fb.com
Fri Feb 3 15:03:18 EST 2017


Excerpts from Yuya Nishihara's message of 2017-02-04 00:11:22 +0900:
> On Thu, 2 Feb 2017 16:56:11 +0000, Jun Wu wrote:
> > Excerpts from Yuya Nishihara's message of 2017-02-03 00:45:22 +0900:
> > > On Thu, 2 Feb 2017 09:34:47 +0000, Jun Wu wrote:
> > > > So what state do we store?
> > > > 
> > > >   {repopath: {name: (hash, content)}}. For example:
> > > > 
> > > >     cache = {'/home/foo/repo1': {'index': ('hash', changelogindex),
> > > >                                  'bookmarks': ('hash', bookmarks),
> > > >                                  .... },
> > > >              '/home/foo/repo2': { .... }, .... }
> > > > 
> > > >   The main ideas here are:
> > > >     1) Store the lowest level objects, like the C changelog index.
> > > >        Because higher level objects could be changed by extensions in
> > > >        unpredictable ways. (this is not true in my hacky prototype though)
> > > >     2) Hash everything. For changelog, it's like the file stat of
> > > >        changelog.i. There must be a strong guarantee that the hash matches
> > > >        the content, which could be challenging, but not impossible. I'll
> > > >        cover more details below.
> > > > 
> > > >   The cache is scoped by repo to make the API simpler/easy to use. It may
> > > >   be interesting to have some global state (like passing back the extension
> > > >   path to import them at runtime).
> > > 
> > > Regarding this and "2) Side-effect-free repo", can or should we design the API
> > > as something like a low-level storage interface? That will allow us to not
> > > make repo/revlog know too much about chg.
> > > 
> > > I don't have any concrete idea, but that would work as follows:
> > > 
> > >  1. chg injects an object to select storage backends
> > >     e.g. repo.storage = chgpreloadable(repo.storage, cache)
> > >  2. repo passes it to revlog, etc.
> > >  3. revlog uses it to read indexfile, where in-memory cache may be returned
> > >     e.g. storage.parserevlog(indexfile)
> > >
> > > Perhaps, this 'storage' object is similar to the one you call 'baserepository'.
> > 
> > I'm not sure if I get the idea (probably not). How does the implementation
> > in the master server look like?
> 
> I was just thinking about how to hack the real repo object without introducing
> much mess. Perhaps the master server wouldn't be that different from your idea.
> 
> > It feels more like "repo.chgcache" to me and the difference is that the
> > vanilla hg will be changed to access objects via it (so the interface looks
> > more consistent).
> 
> Yeah, it might be like repo.chgcache.
> 
> Since we shouldn't pass repo to revlog (it's layering violation), I think
> we'll need a thin wrapper for chgcache anyway.

I mentioned this in the second mail, "4) Where to get preloaded results (in
worker)", we could just expose some kind of global state, like a
"globalcache" module.

> > Things to consider:
> > 
> >   a) Objects being preloaded have dependency - ex. the obsstore depends on
> >      changelog and other things. The preload functions run in a defined
> >      order.
> 
> Maybe dependencies could be passed as arguments?

Ideally, these expensive calculating (ex. obsstore) could be moved to the
index object. In the reality, that requires too much work, and obsstore
preloading requires a subset of "repo", including "repo.revs".

It's possible to decouple obsstore preloading from the repo object, but that
could be a lot of work too.

> >   b) The index file is not always a single file, depending on "vfs".
> 
> Yes. vfs could be owned by storage/chgcache class.
> 
> >   c) The user may want to control what to preload. For example, if they have
> >      an incompatible manifest, they could make changelog preloaded, but not
> >      manifest.
> 
> No idea about (c).
> 
> >   d) Users can add other preloading items easily, not only just the
> >      predefined ones.
> 
> So probably we'll need an extensible table of preloadable items.

If you check my prototype code, it's using a registrar to collect all
@preload functions.

> > I think "storage.parserevlog(indexfile)" (treating index separately, without
> > from a repo object) may have trouble dealing with "a)".


More information about the Mercurial-devel mailing list