Making chg stateful

Fri Feb 3 10:11:22 EST 2017

On Thu, 2 Feb 2017 16:56:11 +0000, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-02-03 00:45:22 +0900:
> > On Thu, 2 Feb 2017 09:34:47 +0000, Jun Wu wrote:
> > > So what state do we store?
> > > 
> > >   {repopath: {name: (hash, content)}}. For example:
> > > 
> > >     cache = {'/home/foo/repo1': {'index': ('hash', changelogindex),
> > >                                  'bookmarks': ('hash', bookmarks),
> > >                                  .... },
> > >              '/home/foo/repo2': { .... }, .... }
> > > 
> > >   The main ideas here are:
> > >     1) Store the lowest level objects, like the C changelog index.
> > >        Because higher level objects could be changed by extensions in
> > >        unpredictable ways. (this is not true in my hacky prototype though)
> > >     2) Hash everything. For changelog, it's like the file stat of
> > >        changelog.i. There must be a strong guarantee that the hash matches
> > >        the content, which could be challenging, but not impossible. I'll
> > >        cover more details below.
> > > 
> > >   The cache is scoped by repo to make the API simpler/easy to use. It may
> > >   be interesting to have some global state (like passing back the extension
> > >   path to import them at runtime).
> > 
> > Regarding this and "2) Side-effect-free repo", can or should we design the API
> > as something like a low-level storage interface? That will allow us to not
> > make repo/revlog know too much about chg.
> > 
> > I don't have any concrete idea, but that would work as follows:
> > 
> >  1. chg injects an object to select storage backends
> >     e.g. repo.storage = chgpreloadable(repo.storage, cache)
> >  2. repo passes it to revlog, etc.
> >  3. revlog uses it to read indexfile, where in-memory cache may be returned
> >     e.g. storage.parserevlog(indexfile)
> >
> > Perhaps, this 'storage' object is similar to the one you call 'baserepository'.
> 
> I'm not sure if I get the idea (probably not). How does the implementation
> in the master server look like?

I was just thinking about how to hack the real repo object without introducing
much mess. Perhaps the master server wouldn't be that different from your idea.

> It feels more like "repo.chgcache" to me and the difference is that the
> vanilla hg will be changed to access objects via it (so the interface looks
> more consistent).

Yeah, it might be like repo.chgcache.

Since we shouldn't pass repo to revlog (it's layering violation), I think
we'll need a thin wrapper for chgcache anyway.

> Things to consider:
> 
>   a) Objects being preloaded have dependency - ex. the obsstore depends on
>      changelog and other things. The preload functions run in a defined
>      order.

Maybe dependencies could be passed as arguments?

>   b) The index file is not always a single file, depending on "vfs".

Yes. vfs could be owned by storage/chgcache class.

>   c) The user may want to control what to preload. For example, if they have
>      an incompatible manifest, they could make changelog preloaded, but not
>      manifest.

No idea about (c).

>   d) Users can add other preloading items easily, not only just the
>      predefined ones.

So probably we'll need an extensible table of preloadable items.

> I think "storage.parserevlog(indexfile)" (treating index separately, without
> from a repo object) may have trouble dealing with "a)".