Making chg stateful

Fri Feb 3 14:50:48 EST 2017

Excerpts from Yuya Nishihara's message of 2017-02-04 00:31:45 +0900:
> On Fri, 3 Feb 2017 09:33:32 +0000, Jun Wu wrote:
> > 1) What does the worker talk to the master via IPC?
> > 
> >   Current: worker sends "possible_dirty_repo: $REPO_PATH" after runcommand
> > 
> >     It's coupled with the (pseudo) repo object. But the master could preload
> >     things in correct order - first changelog, then phase, obsstore, etc.
> > 
> >   What if: worker sends "possible_dirty_index: $INDEX_PATH",
> >   "possible_dirty_obsstore $OBSSTORE_PATH" etc.
> > 
> >     It looks more flexible at first sight. However the dependency issue
> >     could be tricky to solve - for example, the obsstore requires changelog
> >     (and revset).  It's much harder to preload obsstore without a pseudo
> >     repo object with repo.changelog, repo.revs available.
> > 
> >   Therefore, probably just send the repo path.
> 
> Given this, I think pipes would be simpler than using shared memory. The master
> IO can be multiplexed by select().

If it's one pipe per worker, the code change (I tried) is too much. I think
a single non-blocking pipe could be an option.

> (But I don't have strong opinion to disagree with using shm.)
> 
> > 2) Preload function signature (used by master's background thread).
> > 
> >   This function is to hash and load things. It runs in the master, and is
> >   basically a generator (so the "loading" part could be skipped on cache
> >   hit, and it's easier to make content and hash_with_content consistent, if
> >   they share states):
> > 
> >       @preloadregistar(something)
> >       def preloadfunc(...):
> >           yield quick_hash
> >           yield content, hash_with_content
> 
> Isn't it simpler if preloadfunc() returns quick_hash and a delayed function
> to build content? I feel this generator is kinda abuse and makes it less clear
> where the costly part starts.

I have thought about this. "Generator" is used as a lightweight process that
could be terminated cheaply. It shares states (local variables) for hashing
and loading, making certain things easier.

Compare:

      @preload('index')
      def preloadindex(repo):
          f = repo.svfs('changelog.i') # "f" can be shared naturally
          hash = fstat(f).st_size
          yield hash

          indexdata = f.read()
          hash = len(indexdata)
          yield parseindex(indexdata), hash

vs.

      def loadindex(f):
          indexdata = f.read()
          hash = len(indexdata)
          return parseindex(indexdata), hash

      @preload('index')
      def preloadindex(repo):
          f = repo.svfs('changelog.i')
          hash = repo.svfs('changelog.i').st_size
          loadfunc = lambda: loadindex(f) # pass "f" explicitly 
          return hash, loadfunc

Generator seems to be the shortest way to express the logic.
I think it's easy to say that things after the first "yield" is costly.

For readibility, I think improved form mentioned at the end of the previous
mail "yield preload.foo(...)" makes things obvious.

> >   Problem: What's "..." ? It could be "repopath" as that's exactly what the
> >   worker sends to us. But that will make the preload function more
> >   complicated to be correct - repo has "requirements", etc and the index is
> >   not always simply path.join(repo_path, 'store', '00changelog.i').
> >   
> >   Similar to 1, if we decouple from the repo object, index preloading would
> >   be fine:
> > 
> >       @preload('index')
> >       def preloadindex(indexpath):
> >           ....
> > 
> >   But things with dependency will have trouble:
> > 
> >       @preload('obsstore')
> >       def preloadobsstore(obsstorepath):
> >           # how to access repo.revs('....') ?
> > 
> >   Therefore, the preload function needs a repo (even it's "pseudo") to be
> >   convenient to implement.
> 
> For the "...", I agree repopath isn't enough.