[PATCH 05 of 10] chgserver: add a context manager to learn what to include for confighash

Tue Jul 5 16:10:43 EDT 2016

Excerpts from Yuya Nishihara's message of 2016-07-05 22:46:04 +0900:
> > I don't quite get your question. So, I draft some questions:
> > 
> > 1. Is a static whitelist enough?
> > 
> >   No. The issue is 3rd-party extensions.
> >   
> >   We can say this is a limitation of chg and document it somewhere but
> >   people just don't read docs and yell at you whenever they see hg works
> >   while chg does not.
> 
> As long as third-party extensions use their own config sections, e.g.
> "remotefilelog.blah-blah", all of them are included in the hash if the list is
> a whitelist of safe sections. If they read e.g. ui.username in uisetup() and
> we have to work around that, a static list wouldn't help.

I finally got your idea. It's a "blacklist" about what *not* to hash. I
object because it leads to unnecessary server processes. Given the fact
each process needs 100MB+ memory and probably does not share many pages in
the kernel (not forked from an Python ancestor), I'd treat process number
seriously.

> > 2. Other solutions?
> > 
> >   I did have other attempts, like a chg-compat checker, devel-warn or
> >   thinking about an API to let extensions telling us what they want to hash.
> > 
> >   A chg-compat checker is actually hard to implement correctly since
> >   extensions depending on other extensions are hard to check.
> > 
> >   devel-warn will pollute extensions.py.
> > 
> >   An API for extensions work but why do we let extensions telling us things
> >   while we can know ourselves with a little effort?
> >
> > 3. Is "_configitems" necessary since we have "_configsections"?
> > 
> >   I think so. Some extension accesses "ui.debugflag" and wraps core
> >   functions with a version with a logger. Hashing the whole ui section
> >   looks undesirable.
> 
> Huh? Do they want to trust only [ui]debug ignoring --debug option?

--debug is just a sugar setting the config item "ui.debug", which I think is
the source of truth:

    if options['verbose'] or options['debug'] or options['quiet']:
        for opt in ('verbose', 'debug', 'quiet'):
            val = str(bool(options[opt]))
            for ui_ in uis:
                ui_.setconfig('ui', opt, val, '--' + opt)

Currently there is no way to get sys.argv without wrapping
dispatch.runcommand, but the argv stuff is another topic.

> > Therefore I think the "dynamically learn" approach is the way to go.
> > Developers don't need to know about chg and aren't forced to rewrite their
> > code (sometimes it's hard) using other patterns. Developers with chg in mind
> > can write code to reduce the count of chgservers.
> 
> I prefer the static approach if it can save 90% of the problems with a little
> penalty of excessive server redirection. Dynamically updating the list would
> make debugging harder and would need more tests.

I understand your worry. I admit I didn't think too carefully about this
approach (but now I'm clear). The issue is: if the servers disagree about
what to hash, they can generate different hashes. It's especially a problem
if one server changes its idea about what to hash.

I think we should freeze the list at chgserver startup to make sure a single
server has consistent view about what to hash all the time.

For multiple servers, the following (bad) situation may happen:

  1. server A tell client to redirect to hash1
  2. client does not find hash1, starts a new server
  3. the new server generates hash2, while there is already a server running
     at hash2, therefore the starting of the new server is a waste.

To solve it, it seems we have to store the "what to hash" information on
disk, which is more complicated than I thought.

Therefore, what I think the next steps are:

  1. Add new configs to chgserver deciding what to hash:

        [chgserver]
        confighash.sections = section1, section2
        confighash.items = section.item, section.item

    This would also serve as the "api" for other extensions to hash their
    configs as they just need to use setconfig.

    This is a one-time config, chgserver reads it during startup and forget
    about it afterwards.

  2. (Optionally) add the auto-learn logic. I probably won't actively
     work on this, but put it here as it's a possible step.

What do you think? 1 is pretty easy and straight-forward and solves a lot
of issues. I may want to follow up with the chg-compat checker test then.

> > I will send V2 after the unwrapfunction feature so it would be more
> > confident.
> >  
> > > > Note this is not a solution for all cases, namely the following cases still
> > > > need extra work:
> > > > 
> > > > 1. "if ui.config('x'): ui.setconfig('x', 'y')" in uisetup.
> > > >    This will overwrite config 'x' and chg will be unable to hash the value
> > > >    user provides. The pattern is mainly used in the evolve extension. It
> > > >    could be solved if we have clear layers about what are set by extensions
> > > >    and what are set by users.
> > > > 
> > > > 2. Patterns like "self.ui = ui" in uisetup.
> > > >    This will keep a reference to a stale ui object. We probably want to
> > > >    add devel-warn for the case. It may also be reasonable to add a similar
> > > >    check to reposetup.
> > > > 
> > > > 3. Interacting with filesystem or network in uisetup.
> > > >    In general, extensions doing these should be aware of race conditions.  
> > > 
> > > (1) and (2) are generally wrong because the ui passed to uisetup() and
> > > extsetup() are "lui", a temporary copy of ui.  
> > 
> > Why are they wrong?
> > 
> > For (1), you can have an extension:
> > 
> >     # a.py
> >     def uisetup(ui):
> >         # extdiff is in chgserver._configsections
> >         ui.setconfig('extdiff', 'x', 'x')
> >
> > And run chg:
> > 
> >     chg --config extensions.s=$PWD/a.py --config extdiff.x=1 version
> >     chg --config extensions.s=$PWD/a.py --config extdiff.x=2 version
> > 
> > The second command will not start a new server.
> 
> IIRC, config values set in uisetup() aren't copied back to the original ui
> and repo.ui, which is why I said generally wrong.

That's not true :( Otherwise the "# stolen from
tortoisehg.util.copydynamicconfig()" hack will be unnecessary.

> > For (2), if you save ui from uisetup to somewhere (say, a global variable),
> > and then use it in reposetup, it can have wrong configs for all things
> > outside chg confighash. It's not the case for short-lived hg process.
> 
> The ui passed to uisetup() isn't kept updated, which could have a different
> value from repo.ui. IMHO, keeping ui globally is a bad idea.

Yes. It's definitely a bad pattern (which is why I want to devel-warn
refcount change). But the code is unfortunately so tightly integrated with
that pattern and is not easy to change (I did have attempts to rewrite but
given up finally).