[PATCH] revset: added negative cache to lazysets

Matt Mackall mpm at selenic.com
Thu Feb 13 09:23:04 CST 2014


On Wed, 2014-02-12 at 18:34 -0800, Lucas Moscovicz wrote:
> # HG changeset patch
> # User Lucas Moscovicz <lmoscovicz at fb.com>
> # Date 1391556717 28800
> #      Tue Feb 04 15:31:57 2014 -0800
> # Node ID 7d11b638a9cf4e99213e73a704a497da5a1e2d2b
> # Parent  cf2a8a38d2665eff5787de8c12d4d23031493ba6
> revset: added negative cache to lazysets
> 
> The idea of this is to be able to answer faster when something is not
> contained in the lazyset instead of having to calculate it again.
> 
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2100,9 +2100,15 @@
>      def __init__(self, subset, condition):
>          self._subset = subset
>          self._condition = condition
> +        self._ncache = set()
>  
>      def __contains__(self, x):
> -        return x in self._subset and self._condition(x)

Let's do an approximate operation count on this:

> +        if x in self._ncache:

if, "in", name lookup in self

> +            return False
> +        c =  x in self._subset and self._condition(x)

in, and, name lookup, function call, assign

> +        if not c:

if

> +            self._ncache.add(x)

name lookup x 2, set add

> +        return c

return.

In the fast path, we do:

 if, in, lookup, return

In the slow path, we do:

 if, in, lookup, in, lookup, func, assign, if, and, lookup, lookup,
 set add, return

Reminder: these foo.bar.baz name lookup operations are really expensive!

Now consider this:

    self._cache = {}
...
    c = self._cache
    if x not in c:
        c[x] = x in self._subset and self._condition(x)
    return c[x]

This is the pattern we use for caching all over Mercurial. Fast path:

 lookup, assign, if, in, return

Slow path:

 lookup, assign, if, in, lookup, lookup, func, dict add, and, return

This has fewer total ops, fewer name lookups.. and it caches both
positive and negative results!

Depending on what we're caching, we can drop the initial "c = " to get a
marginally faster fast path in exchange for a slower slow path. If the
fast path is more common than the slow path, this will be a win. But for
caches where we expect the slow path to be more common, we want to use
the above.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list