[PATCH 05 of 10] chgserver: add a context manager to learn what to include for confighash

Wed Jul 6 10:14:59 EDT 2016

Excerpts from Yuya Nishihara's message of 2016-07-06 22:15:23 +0900:
> > I finally got your idea. It's a "blacklist" about what *not* to hash. I
> > object because it leads to unnecessary server processes. Given the fact
> > each process needs 100MB+ memory and probably does not share many pages in
> > the kernel (not forked from an Python ancestor), I'd treat process number
> > seriously.
> 
> What you say black is what I see white. Anyway, do you have many variants of
> configurations in production? Most of my repository's hgrc files only set
> "ui", "paths", and "email", so the not-to-hash list would work for me.
> 
> Also, I guess 100MB+ is the VIRT value. I think it is the size of the address
> space.

In production the config files are managed by Chef and can be complex, the
Chef code has a bunch of sections like:

   if node.in_alpha_tier?
     hgrc['newfeatrue1']['newsetting'] = somevalue
     ...
   elsif node.in_beta_tier?
     ...
   end

For repo hgrc, it has "%include /etc/mercurial/reporc/reponame.rc". This is
to store non-common configs. For example, some repos need hgsubversion and
some don't.

About whitelist vs blacklist, I think it's less important. I prefer the
current situation.

> > > Huh? Do they want to trust only [ui]debug ignoring --debug option?  
> > 
> > --debug is just a sugar setting the config item "ui.debug", which I think is
> > the source of truth:
> > 
> >     if options['verbose'] or options['debug'] or options['quiet']:
> >         for opt in ('verbose', 'debug', 'quiet'):
> >             val = str(bool(options[opt]))
> >             for ui_ in uis:
> >                 ui_.setconfig('ui', opt, val, '--' + opt)
> 
> Extensions are loaded before the ui options are set.

You are right. ui.debugflag is set before extensions are loaded, which I
think is not ideal. I will make sure ui.config('ui', 'debug') is the source
of truth while doing the ui/config refactoring. But that's another topic.

> > I understand your worry. I admit I didn't think too carefully about this
> > approach (but now I'm clear). The issue is: if the servers disagree about
> > what to hash, they can generate different hashes. It's especially a problem
> > if one server changes its idea about what to hash.
> > 
> > I think we should freeze the list at chgserver startup to make sure a single
> > server has consistent view about what to hash all the time.
> > 
> > For multiple servers, the following (bad) situation may happen:
> > 
> >   1. server A tell client to redirect to hash1
> >   2. client does not find hash1, starts a new server
> >   3. the new server generates hash2, while there is already a server running
> >      at hash2, therefore the starting of the new server is a waste.
> > 
> > To solve it, it seems we have to store the "what to hash" information on
> > disk, which is more complicated than I thought.
> > 
> > Therefore, what I think the next steps are:
> > 
> >   1. Add new configs to chgserver deciding what to hash:
> >   
> >         [chgserver]
> >         confighash.sections = section1, section2
> >         confighash.items = section.item, section.item
> > 
> >     This would also serve as the "api" for other extensions to hash their
> >     configs as they just need to use setconfig.
> > 
> >     This is a one-time config, chgserver reads it during startup and forget
> >     about it afterwards.
> > 
> >   2. (Optionally) add the auto-learn logic. I probably won't actively
> >      work on this, but put it here as it's a possible step.
> > 
> > What do you think? 1 is pretty easy and straight-forward and solves a lot
> > of issues. I may want to follow up with the chg-compat checker test then.
> 
> I don't think (1) needs to be a config section, which may be updated by
> repository's hgrc. Instead, extensions can update the list by ui/extsetup()
>
>   chgserver = extensions.find('chgserver')
>   chgserver._configsections.add(...)
> 
> This will be simpler if chgserver gets out of hgext.

This is the old "API" discussion that makes extensions explicitly aware of
chg. The main downside I see is, once the API is introduced, it's hard to
remove. Besides, exposing this private variable would also cause trouble if
other extensions modify it in places other than ui/extsetup.

In general, I try to keep extensions "clean" without having code to
explicitly work with chg, thus the auto learn approach proposals.

Thinking about that, I reconsider the auto-learn approach, and it's likely
the cleanest we can get. Because we hash [extensions] already, it's hard
to have inconsistent hashes situation unless the extensions use
outside-world conditions to get config items.

Therefore I proabably want the auto-learn feature in, after the unwrap
series. What do you think?

> > > IIRC, config values set in uisetup() aren't copied back to the original ui
> > > and repo.ui, which is why I said generally wrong.  
> > 
> > That's not true :( Otherwise the "# stolen from
> > tortoisehg.util.copydynamicconfig()" hack will be unnecessary.
> 
> The hack should be necessary because config values are updated by reading
> .hg/hgrc.
> 
> Try "hg config foo.bar" with/without a repository to see if ui.setconfig()
> survives.
> 
>   def uisetup(ui):
>       ui.setconfig('foo', 'bar', 'baz')

I see. I will address all these ugliness with the ui/config refactoring.