[PATCH 05 of 10] chgserver: add a context manager to learn what to include for confighash

Jun Wu quark at fb.com
Tue Jul 5 16:10:43 EDT 2016


Excerpts from Yuya Nishihara's message of 2016-07-05 22:46:04 +0900:
> > I don't quite get your question. So, I draft some questions:
> > 
> > 1. Is a static whitelist enough?
> > 
> >   No. The issue is 3rd-party extensions.
> >   
> >   We can say this is a limitation of chg and document it somewhere but
> >   people just don't read docs and yell at you whenever they see hg works
> >   while chg does not.
> 
> As long as third-party extensions use their own config sections, e.g.
> "remotefilelog.blah-blah", all of them are included in the hash if the list is
> a whitelist of safe sections. If they read e.g. ui.username in uisetup() and
> we have to work around that, a static list wouldn't help.

I finally got your idea. It's a "blacklist" about what *not* to hash. I
object because it leads to unnecessary server processes. Given the fact
each process needs 100MB+ memory and probably does not share many pages in
the kernel (not forked from an Python ancestor), I'd treat process number
seriously.
 
> > 2. Other solutions?
> > 
> >   I did have other attempts, like a chg-compat checker, devel-warn or
> >   thinking about an API to let extensions telling us what they want to hash.
> > 
> >   A chg-compat checker is actually hard to implement correctly since
> >   extensions depending on other extensions are hard to check.
> > 
> >   devel-warn will pollute extensions.py.
> > 
> >   An API for extensions work but why do we let extensions telling us things
> >   while we can know ourselves with a little effort?
> >
> > 3. Is "_configitems" necessary since we have "_configsections"?
> > 
> >   I think so. Some extension accesses "ui.debugflag" and wraps core
> >   functions with a version with a logger. Hashing the whole ui section
> >   looks undesirable.
> 
> Huh? Do they want to trust only [ui]debug ignoring --debug option?

--debug is just a sugar setting the config item "ui.debug", which I think is
the source of truth:

    if options['verbose'] or options['debug'] or options['quiet']:
        for opt in ('verbose', 'debug', 'quiet'):
            val = str(bool(options[opt]))
            for ui_ in uis:
                ui_.setconfig('ui', opt, val, '--' + opt)

Currently there is no way to get sys.argv without wrapping
dispatch.runcommand, but the argv stuff is another topic.
 
> > Therefore I think the "dynamically learn" approach is the way to go.
> > Developers don't need to know about chg and aren't forced to rewrite their
> > code (sometimes it's hard) using other patterns. Developers with chg in mind
> > can write code to reduce the count of chgservers.
> 
> I prefer the static approach if it can save 90% of the problems with a little
> penalty of excessive server redirection. Dynamically updating the list would
> make debugging harder and would need more tests.

I understand your worry. I admit I didn't think too carefully about this
approach (but now I'm clear). The issue is: if the servers disagree about
what to hash, they can generate different hashes. It's especially a problem
if one server changes its idea about what to hash.

I think we should freeze the list at chgserver startup to make sure a single
server has consistent view about what to hash all the time.

For multiple servers, the following (bad) situation may happen:

  1. server A tell client to redirect to hash1
  2. client does not find hash1, starts a new server
  3. the new server generates hash2, while there is already a server running
     at hash2, therefore the starting of the new server is a waste.

To solve it, it seems we have to store the "what to hash" information on
disk, which is more complicated than I thought.

Therefore, what I think the next steps are:

  1. Add new configs to chgserver deciding what to hash:
  
        [chgserver]
        confighash.sections = section1, section2
        confighash.items = section.item, section.item

    This would also serve as the "api" for other extensions to hash their
    configs as they just need to use setconfig.

    This is a one-time config, chgserver reads it during startup and forget
    about it afterwards.

  2. (Optionally) add the auto-learn logic. I probably won't actively
     work on this, but put it here as it's a possible step.

What do you think? 1 is pretty easy and straight-forward and solves a lot
of issues. I may want to follow up with the chg-compat checker test then.

> > I will send V2 after the unwrapfunction feature so it would be more
> > confident.
> >  
> > > > Note this is not a solution for all cases, namely the following cases still
> > > > need extra work:
> > > > 
> > > > 1. "if ui.config('x'): ui.setconfig('x', 'y')" in uisetup.
> > > >    This will overwrite config 'x' and chg will be unable to hash the value
> > > >    user provides. The pattern is mainly used in the evolve extension. It
> > > >    could be solved if we have clear layers about what are set by extensions
> > > >    and what are set by users.
> > > > 
> > > > 2. Patterns like "self.ui = ui" in uisetup.
> > > >    This will keep a reference to a stale ui object. We probably want to
> > > >    add devel-warn for the case. It may also be reasonable to add a similar
> > > >    check to reposetup.
> > > > 
> > > > 3. Interacting with filesystem or network in uisetup.
> > > >    In general, extensions doing these should be aware of race conditions.  
> > > 
> > > (1) and (2) are generally wrong because the ui passed to uisetup() and
> > > extsetup() are "lui", a temporary copy of ui.  
> > 
> > Why are they wrong?
> > 
> > For (1), you can have an extension:
> > 
> >     # a.py
> >     def uisetup(ui):
> >         # extdiff is in chgserver._configsections
> >         ui.setconfig('extdiff', 'x', 'x')
> >
> > And run chg:
> > 
> >     chg --config extensions.s=$PWD/a.py --config extdiff.x=1 version
> >     chg --config extensions.s=$PWD/a.py --config extdiff.x=2 version
> > 
> > The second command will not start a new server.
> 
> IIRC, config values set in uisetup() aren't copied back to the original ui
> and repo.ui, which is why I said generally wrong.

That's not true :( Otherwise the "# stolen from
tortoisehg.util.copydynamicconfig()" hack will be unnecessary.

> > For (2), if you save ui from uisetup to somewhere (say, a global variable),
> > and then use it in reposetup, it can have wrong configs for all things
> > outside chg confighash. It's not the case for short-lived hg process.
> 
> The ui passed to uisetup() isn't kept updated, which could have a different
> value from repo.ui. IMHO, keeping ui globally is a bad idea.

Yes. It's definitely a bad pattern (which is why I want to devel-warn
refcount change). But the code is unfortunately so tightly integrated with
that pattern and is not easy to change (I did have attempts to rewrite but
given up finally).


More information about the Mercurial-devel mailing list