[PATCH 05 of 10] chgserver: add a context manager to learn what to include for confighash

Wed Jul 6 09:15:23 EDT 2016

On Tue, 5 Jul 2016 21:10:43 +0100, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-07-05 22:46:04 +0900:
> > > I don't quite get your question. So, I draft some questions:
> > > 
> > > 1. Is a static whitelist enough?
> > > 
> > >   No. The issue is 3rd-party extensions.
> > >   
> > >   We can say this is a limitation of chg and document it somewhere but
> > >   people just don't read docs and yell at you whenever they see hg works
> > >   while chg does not.  
> > 
> > As long as third-party extensions use their own config sections, e.g.
> > "remotefilelog.blah-blah", all of them are included in the hash if the list is
> > a whitelist of safe sections. If they read e.g. ui.username in uisetup() and
> > we have to work around that, a static list wouldn't help.  
> 
> I finally got your idea. It's a "blacklist" about what *not* to hash. I
> object because it leads to unnecessary server processes. Given the fact
> each process needs 100MB+ memory and probably does not share many pages in
> the kernel (not forked from an Python ancestor), I'd treat process number
> seriously.

What you say black is what I see white. Anyway, do you have many variants of
configurations in production? Most of my repository's hgrc files only set
"ui", "paths", and "email", so the not-to-hash list would work for me.

Also, I guess 100MB+ is the VIRT value. I think it is the size of the address
space.

> > Huh? Do they want to trust only [ui]debug ignoring --debug option?  
> 
> --debug is just a sugar setting the config item "ui.debug", which I think is
> the source of truth:
> 
>     if options['verbose'] or options['debug'] or options['quiet']:
>         for opt in ('verbose', 'debug', 'quiet'):
>             val = str(bool(options[opt]))
>             for ui_ in uis:
>                 ui_.setconfig('ui', opt, val, '--' + opt)

Extensions are loaded before the ui options are set.

> > > Therefore I think the "dynamically learn" approach is the way to go.
> > > Developers don't need to know about chg and aren't forced to rewrite their
> > > code (sometimes it's hard) using other patterns. Developers with chg in mind
> > > can write code to reduce the count of chgservers.  
> > 
> > I prefer the static approach if it can save 90% of the problems with a little
> > penalty of excessive server redirection. Dynamically updating the list would
> > make debugging harder and would need more tests.  
> 
> I understand your worry. I admit I didn't think too carefully about this
> approach (but now I'm clear). The issue is: if the servers disagree about
> what to hash, they can generate different hashes. It's especially a problem
> if one server changes its idea about what to hash.
> 
> I think we should freeze the list at chgserver startup to make sure a single
> server has consistent view about what to hash all the time.
> 
> For multiple servers, the following (bad) situation may happen:
> 
>   1. server A tell client to redirect to hash1
>   2. client does not find hash1, starts a new server
>   3. the new server generates hash2, while there is already a server running
>      at hash2, therefore the starting of the new server is a waste.
> 
> To solve it, it seems we have to store the "what to hash" information on
> disk, which is more complicated than I thought.
> 
> Therefore, what I think the next steps are:
> 
>   1. Add new configs to chgserver deciding what to hash:
>   
>         [chgserver]
>         confighash.sections = section1, section2
>         confighash.items = section.item, section.item
> 
>     This would also serve as the "api" for other extensions to hash their
>     configs as they just need to use setconfig.
> 
>     This is a one-time config, chgserver reads it during startup and forget
>     about it afterwards.
> 
>   2. (Optionally) add the auto-learn logic. I probably won't actively
>      work on this, but put it here as it's a possible step.
> 
> What do you think? 1 is pretty easy and straight-forward and solves a lot
> of issues. I may want to follow up with the chg-compat checker test then.

I don't think (1) needs to be a config section, which may be updated by
repository's hgrc. Instead, extensions can update the list by ui/extsetup()

  chgserver = extensions.find('chgserver')
  chgserver._configsections.add(...)

This will be simpler if chgserver gets out of hgext.

> > > > > Note this is not a solution for all cases, namely the following cases still
> > > > > need extra work:
> > > > > 
> > > > > 1. "if ui.config('x'): ui.setconfig('x', 'y')" in uisetup.
> > > > >    This will overwrite config 'x' and chg will be unable to hash the value
> > > > >    user provides. The pattern is mainly used in the evolve extension. It
> > > > >    could be solved if we have clear layers about what are set by extensions
> > > > >    and what are set by users.
> > > > > 
> > > > > 2. Patterns like "self.ui = ui" in uisetup.
> > > > >    This will keep a reference to a stale ui object. We probably want to
> > > > >    add devel-warn for the case. It may also be reasonable to add a similar
> > > > >    check to reposetup.
> > > > > 
> > > > > 3. Interacting with filesystem or network in uisetup.
> > > > >    In general, extensions doing these should be aware of race conditions.    
> > > > 
> > > > (1) and (2) are generally wrong because the ui passed to uisetup() and
> > > > extsetup() are "lui", a temporary copy of ui.    
> > > 
> > > Why are they wrong?
> > > 
> > > For (1), you can have an extension:
> > > 
> > >     # a.py
> > >     def uisetup(ui):
> > >         # extdiff is in chgserver._configsections
> > >         ui.setconfig('extdiff', 'x', 'x')
> > >
> > > And run chg:
> > > 
> > >     chg --config extensions.s=$PWD/a.py --config extdiff.x=1 version
> > >     chg --config extensions.s=$PWD/a.py --config extdiff.x=2 version
> > > 
> > > The second command will not start a new server.  
> > 
> > IIRC, config values set in uisetup() aren't copied back to the original ui
> > and repo.ui, which is why I said generally wrong.  
> 
> That's not true :( Otherwise the "# stolen from
> tortoisehg.util.copydynamicconfig()" hack will be unnecessary.

The hack should be necessary because config values are updated by reading
.hg/hgrc.

Try "hg config foo.bar" with/without a repository to see if ui.setconfig()
survives.

  def uisetup(ui):
      ui.setconfig('foo', 'bar', 'baz')