Making chg stateful
Jun Wu
quark at fb.com
Fri Feb 3 14:50:48 EST 2017
Excerpts from Yuya Nishihara's message of 2017-02-04 00:31:45 +0900:
> On Fri, 3 Feb 2017 09:33:32 +0000, Jun Wu wrote:
> > 1) What does the worker talk to the master via IPC?
> >
> > Current: worker sends "possible_dirty_repo: $REPO_PATH" after runcommand
> >
> > It's coupled with the (pseudo) repo object. But the master could preload
> > things in correct order - first changelog, then phase, obsstore, etc.
> >
> > What if: worker sends "possible_dirty_index: $INDEX_PATH",
> > "possible_dirty_obsstore $OBSSTORE_PATH" etc.
> >
> > It looks more flexible at first sight. However the dependency issue
> > could be tricky to solve - for example, the obsstore requires changelog
> > (and revset). It's much harder to preload obsstore without a pseudo
> > repo object with repo.changelog, repo.revs available.
> >
> > Therefore, probably just send the repo path.
>
> Given this, I think pipes would be simpler than using shared memory. The master
> IO can be multiplexed by select().
If it's one pipe per worker, the code change (I tried) is too much. I think
a single non-blocking pipe could be an option.
> (But I don't have strong opinion to disagree with using shm.)
>
> > 2) Preload function signature (used by master's background thread).
> >
> > This function is to hash and load things. It runs in the master, and is
> > basically a generator (so the "loading" part could be skipped on cache
> > hit, and it's easier to make content and hash_with_content consistent, if
> > they share states):
> >
> > @preloadregistar(something)
> > def preloadfunc(...):
> > yield quick_hash
> > yield content, hash_with_content
>
> Isn't it simpler if preloadfunc() returns quick_hash and a delayed function
> to build content? I feel this generator is kinda abuse and makes it less clear
> where the costly part starts.
I have thought about this. "Generator" is used as a lightweight process that
could be terminated cheaply. It shares states (local variables) for hashing
and loading, making certain things easier.
Compare:
@preload('index')
def preloadindex(repo):
f = repo.svfs('changelog.i') # "f" can be shared naturally
hash = fstat(f).st_size
yield hash
indexdata = f.read()
hash = len(indexdata)
yield parseindex(indexdata), hash
vs.
def loadindex(f):
indexdata = f.read()
hash = len(indexdata)
return parseindex(indexdata), hash
@preload('index')
def preloadindex(repo):
f = repo.svfs('changelog.i')
hash = repo.svfs('changelog.i').st_size
loadfunc = lambda: loadindex(f) # pass "f" explicitly
return hash, loadfunc
Generator seems to be the shortest way to express the logic.
I think it's easy to say that things after the first "yield" is costly.
For readibility, I think improved form mentioned at the end of the previous
mail "yield preload.foo(...)" makes things obvious.
> > Problem: What's "..." ? It could be "repopath" as that's exactly what the
> > worker sends to us. But that will make the preload function more
> > complicated to be correct - repo has "requirements", etc and the index is
> > not always simply path.join(repo_path, 'store', '00changelog.i').
> >
> > Similar to 1, if we decouple from the repo object, index preloading would
> > be fine:
> >
> > @preload('index')
> > def preloadindex(indexpath):
> > ....
> >
> > But things with dependency will have trouble:
> >
> > @preload('obsstore')
> > def preloadobsstore(obsstorepath):
> > # how to access repo.revs('....') ?
> >
> > Therefore, the preload function needs a repo (even it's "pseudo") to be
> > convenient to implement.
>
> For the "...", I agree repopath isn't enough.
More information about the Mercurial-devel
mailing list