[PATCH 04 of 10] localrepo: move ui loading to baselocalrepository

Jun Wu quark at fb.com
Mon Feb 13 20:09:26 EST 2017


Excerpts from Augie Fackler's message of 2017-02-13 18:58:59 -0500:
> On Fri, Feb 10, 2017 at 01:07:49PM -0800, Jun Wu wrote:
> > Side-effects are basically everything related to writes, like mutating
> > (wrapping) internal Python objects, etc. They are done by Python code
> > provided by an extension out of core's control. Usually that means
> > reposetup, uisetup, etc.
> 
> Usually, but not always - I've seen many extensions that call
> wrapfunction or similar outside of {repo,ui}setup.

Those extensions need to be fixed. The new chg will rely on the
side-effect-free assumption as well.

> > Reading things is fine - loading configs, changelog.i, obsstore, etc. as
> > long as no malicious extension changes the loading logic. Currently for chg
> > to work correctly, this assumption must be true.
> >
> > The difference of the above two is whether we can revert the effects. For
> > objects loaded by reading certain things, once we discard the object, the
> > effect is considered reverted. But for things like x.__class__ = y;
> > extensions.wrapfoo(...), etc., they are basically impossible to revert
> > cleanly (confidently). So we avoid them at all costs.
> 
> I'm having trouble wrapping my head around this series in a useful way
> - should we defer it to the sprint, or do you want to try and explain
> the arc of this a little more in a v2 with more details in the first
> commit message as to what you mean by side effects and what the
> expected boundary between these components is?
> 
> (I'm specifically a little perplexed by the inheritance relationship
> to avoid side effects, I just can't quite get my head around it.)

The end-goal is to get repo.vfs and repo.svfs available to the chg master
process, which runs preload functions. So preload functions could read files
properly, without side effects (like actually writing any files in the repo):

    def preloadchangelog(side-effect-free-repo):
        changelogfile = repo.svfs('changelog.i')
                        ^^^^^^^^^
                     just want this to be available, with the confidence
                     that constructing the repo object is side-effect free

Without a "repo" object, the preload code will only have "repopath":

    def preloadchangelog(repopath):
        # check repo requirements and confirm it's a format that we can
        # support, probably also check that vfs is not modified
        ...
        path = os.path.join(repopath, '.hg', 'store', 'changelog.i')
        changelogfile = open(path, 'rb')

Although that's possible, that shifts a lot of burden (like repo
requirements handling, vfs checks etc) from the preloading framework to the
author of preload functions.
 

Whatever we will end up with, I want to make sure:

  1. There is a "repo" object passed to preload functions, not a "repopath"

     "repopath" is just not practically useful as too many checks need to be
     done on it. The repo object does not have to be a full repo - a minimal
     version could only have vfs and svfs, so preload functions can read
     contents properly.

  2. Avoid unexpected reposetup() runs

    - In the future, chg does not run uisetup()s, for correctness. Because
      uisetup() runs only once and will take incorrect configs.
    - Extensions do not expect reposetup() runs when uisetup() does not run.
    - So repo.__init__ should not call reposetup()s.

    (this could be done alternatively by ignoring reposetup()s for
     extensions that haven't run uisetup()s)

  3. Avoid unwanted side effects

    - An extension starts a new SSH connection when repo.__init__ is called
    - An extension writes to some log file when repo.__init__ is called 

So I think moving part of "localrepository" to a smaller object is
reasonable. An alternative I can think of is to disable side-effect methods
in a subclass of localrepository:
  
    class purerepo(localrepository):
        def __init__(..., create=False):
            super(..., create=False)
          
        def lock(self):
            raise ProgrammingError('cannot lock')
  
        def transaction(self):
            raise ProgrammingError('cannot do transaction')

If that looks more reasonable, I can go that approach, which will probably
be more compatible, while less confident about the side-effect-free part.


More information about the Mercurial-devel mailing list