[PATCH 05 of 10] chgserver: add a context manager to learn what to include for confighash

Wed Jul 6 11:05:06 EDT 2016

On Wed, 6 Jul 2016 15:14:59 +0100, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-07-06 22:15:23 +0900:
> > > I finally got your idea. It's a "blacklist" about what *not* to hash. I
> > > object because it leads to unnecessary server processes. Given the fact
> > > each process needs 100MB+ memory and probably does not share many pages in
> > > the kernel (not forked from an Python ancestor), I'd treat process number
> > > seriously.  
> > 
> > What you say black is what I see white. Anyway, do you have many variants of
> > configurations in production? Most of my repository's hgrc files only set
> > "ui", "paths", and "email", so the not-to-hash list would work for me.
> > 
> > Also, I guess 100MB+ is the VIRT value. I think it is the size of the address
> > space.  
> 
> In production the config files are managed by Chef and can be complex, the
> Chef code has a bunch of sections like:
> 
>    if node.in_alpha_tier?
>      hgrc['newfeatrue1']['newsetting'] = somevalue
>      ...
>    elsif node.in_beta_tier?
>      ...
>    end
> 
> For repo hgrc, it has "%include /etc/mercurial/reporc/reponame.rc". This is
> to store non-common configs. For example, some repos need hgsubversion and
> some don't.

Do they have significant differences in non-core configs other than
"extensions" section?

> About whitelist vs blacklist, I think it's less important. I prefer the
> current situation.

It defines the default behavior of chg, whether or not chg is permissive for
third-party extensions which process their config values in undesired way.
IMHO it's important property of chg.

> >   chgserver = extensions.find('chgserver')
> >   chgserver._configsections.add(...)
> > 
> > This will be simpler if chgserver gets out of hgext.  
> 
> This is the old "API" discussion that makes extensions explicitly aware of
> chg. The main downside I see is, once the API is introduced, it's hard to
> remove. Besides, exposing this private variable would also cause trouble if
> other extensions modify it in places other than ui/extsetup.
> 
> In general, I try to keep extensions "clean" without having code to
> explicitly work with chg, thus the auto learn approach proposals.
> 
> Thinking about that, I reconsider the auto-learn approach, and it's likely
> the cleanest we can get. Because we hash [extensions] already, it's hard
> to have inconsistent hashes situation unless the extensions use
> outside-world conditions to get config items.

Another possible problem is that extensions in repo/hgrc can be loaded
either while booting the server or after the server is fully booted. uisetup()
may behave differently depending on ui or extensions state. I don't know
if it can be a real problem, though.

I understand you greatly invest on improving chg, but the situation where
every repo has different config of third-party extensions seems very specific
to FB. That made me feel your auto-learn function is too complicated.