[PATCH 2 of 6] localrepo: add internal "pyhooks" hooks mechanism

Sat Jul 12 16:05:36 CDT 2014

On 7/12/2014 1:12 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1405188833 25200
> #      Sat Jul 12 11:13:53 2014 -0700
> # Node ID 8fa7b8de6cf5c6f809cabfd530d7f14383a18c2c
> # Parent  ab912eb22894240c60ff757db60e06e343fef6a7
> localrepo: add internal "pyhooks" hooks mechanism
> 
> The intent of this feature addition is to give extensions (and possibly
> even core code) a better way than monkeypatching/wrapping to
> modify/supplement run-time behavior.
> 
> Hooks have a few significant advantages over monkeypatching/wrapping:
> 
> * Extensibility points are well defined. If Mercurial maintainers wish
>   to clearly mark an activity as extensible, they can add a pyhook
>   for it. Contrast with monkeypatching/wrapping, where it isn't always
>   clear what the risks with each function are. Hooks give extension
>   authors a clear set of points from which to start extending.
> 
> * They are instance-specific. Monkeypatching often results in changing
>   symbols on modules as opposed to a per-instance basis. Extensions may
>   wish to modify behavior for classes of a certain type or perhaps even
>   specific instances of a specific class. Globally modifying functions
>   and then filtering for applicability at run-time can be difficult
>   and dangerous. Beginning extension authors may not realize the full
>   impact of global changes, especially in "shared" process spaces, such
>   as hgweb or the command server. Per-instance hooks are much safer.
> 
> The patch author considered an alternative implementation that
> introduced hooks.addrepohook() or extensions.addrepohook() and
> hooks.runrepohook() or extensions.runrepohook(). In the mind of
> the patch author, the choice of where the API should live and the
> names of the APIs (the author concedes "pyhook" isn't a great
> name but can think of nothing better) are better decided by
> someone with more experience than him. The author anticipates
> much bikeshedding on this patch.
> 
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -300,8 +300,11 @@ class localrepository(object):
>          # - working directory parent change,
>          # - bookmark changes
>          self.filteredrevcache = {}
>  
> +        # Maps names to list of callables.
> +        self._hooks = {}
> +
>      def close(self):
>          pass
>  
>      def _restrictcapabilities(self, caps):
> @@ -485,8 +488,50 @@ class localrepository(object):
>          replacing code that is expected to call a hook.
>          """
>          return hook.hook(self.ui, self, name, throw, **args)
>  
> +    def addpyhook(self, name, fn):
> +        """Register a Python hook against this repo instance.
> +
> +        It is common to want to execute some Python code when certain events
> +        occur. This is common in extensions. This method provides a
> +        registration mechanism to do that.
> +
> +        This method receives the name of a hook, name, and a callable, fn.
> +
> +        If the hook name is not known, KeyError will be raised. This means
> +        that if a hook is deleted, extensions will fail fast unless they catch
> +        KeyError.
> +
> +        The Mercurial API does not make any guarantees about the stability
> +        of arguments passed to any called hook. However, an effort is made
> +        to avoid unnecessary churn.
> +
> +        "pyhooks" are an internal-oriented variation of the external-facing
> +        hooks mechanism. The latter has strong API guarantees and hooks can
> +        be added via hgrc files. pyhooks are strictly internal and have
> +        weaker API guarantees.
> +        """
> +        if not callable(fn):
> +            raise ValueError('Argument is not callable')
> +        self._hooks[name].append(fn)
> +
> +    def runpyhook(self, name, **args):
> +        """Run a Python hook.
> +
> +        All callables registered via addpyhook() will be executed in the order
> +        they were registered.
> +
> +        Each hook will receive as arguments this repo instance as its single
> +        positional argument and the named arguments passed into this method, if
> +        any. All custom arguments are named because the API contract is not
> +        guaranteed and this gives extensions yet another to point to fail
> +        fast (unknown arguments will result in call failures and will require
> +        extensions to adapt to changes in the API).
> +        """
> +        for fn in self._hooks[name]:
> +            fn(self, **args)
> +
>      @unfilteredmethod
>      def _tag(self, names, node, message, local, user, date, extra={},
>               editor=False):
>          if isinstance(names, str):
> 

I went on a short walk and realized of all the potential names I chose
for this, "pyhooks" is one of the worst.

Drawing from personal experience, "event" is almost certainly a better
terminology, made even more certain because of the ambiguity with
existing Mercurial "hooks." "Events" immediately made me think of
"dispatch," "run," and "handlers." I anticipate bikeshedding on names.

I also anticipate bikeshedding on the mechanism for registering these
handlers. I've coded per-type methods for registering per-instance event
handlers. Alternatively, we could stick a .events dict on each instance:

repo.events.beforepush += custombeforepush

or

repo.events.beforepush.append(custombeforepush)

In this approach, each type has its event handling mechanism. Perhaps we
want to unify this in extensions.py or hooks.py? e.g.

extensions.addeventhandler(repo, 'beforepush' custombeforepush)

Where addeventhandler looks something like:

def addeventhandler(thing, event, handler):
    if not hasattr(thing, 'events'):
        raise ValueError('object does not support event handlers')
    thing.events[event].append(handler)

Perhaps this could get rolled into a common base class for everything
with event handlers so instances have a .addeventhandler. This quickly
turns into a DOM-style event handling mechanism. I'm not sure if that's
good or bad.

Honestly, the simplest solution is to add a bunch of instance variables
to classes like localrepo, one variable per event name. e.g.
|repo.beforepush += custombeforepush| or even |repo.onbeforepush +=
custombeforepush| so we can easily denote which attributes are events.
But people have opinions about localrepo having too many methods. I'm
not sure if this extends to attributes or just methods?

I'm just not sure how we want to do this. I've put one solution forward
as code. I'd prefer to only code one rewrite.

Let the bikeshedding begin.