[PATCH 3 of 3 V2] util: reimplement lrucachedict

Mon Dec 7 16:56:13 CST 2015

On Sun, Dec 06, 2015 at 09:35:24PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1449457450 28800
> #      Sun Dec 06 19:04:10 2015 -0800
> # Node ID afbaaeba682753d329975fd3fb85e3af44a653fe
> # Parent  07aa6d894c36f7a0aa37a1896ef1779e8e9763ab
> util: reimplement lrucachedict

I like it. Other than one commit message query I'm a fan.

>
> As part of attempting to more aggressively use the existing
> lrucachedict, collections.deque operations were frequently
> showing up in profiling output, negating benefits of caching.
>
> Searching the internet seems to tell me that the most efficient
> way to implement an LRU cache in Python is to have a dict indexing
> the cached entries and then to use a doubly linked list to track
> freshness of each entry. So, this patch replaces our existing
> lrucachedict with a version using such a pattern.
>
> The recently introduced perflrucachedict command reveals the
> following timings for 10,000 operations for the following cache
> sizes for the existing cache:
>
> n=4     init=0.004079 gets=0.003632 sets=0.005188 mixed=0.005402
> n=8     init=0.004045 gets=0.003998 sets=0.005064 mixed=0.005328
> n=16    init=0.004011 gets=0.004496 sets=0.005021 mixed=0.005555
> n=32    init=0.004064 gets=0.005611 sets=0.005188 mixed=0.006189
> n=64    init=0.003975 gets=0.007684 sets=0.005178 mixed=0.007245
> n=128   init=0.004121 gets=0.012005 sets=0.005422 mixed=0.009471
> n=256   init=0.004143 gets=0.020295 sets=0.005227 mixed=0.013612
> n=512   init=0.004039 gets=0.036703 sets=0.005243 mixed=0.020685
> n=1024  init=0.004193 gets=0.068142 sets=0.005251 mixed=0.033064
> n=2048  init=0.004070 gets=0.133383 sets=0.005160 mixed=0.050359
> n=4096  init=0.004053 gets=0.265194 sets=0.004868 mixed=0.048352
> n=8192  init=0.004087 gets=0.542218 sets=0.004562 mixed=0.032753
> n=16384 init=0.004106 gets=1.064055 sets=0.004179 mixed=0.020367
> n=32768 init=0.004034 gets=2.097620 sets=0.004260 mixed=0.013031
> n=65536 init=0.004108 gets=4.106390 sets=0.004268 mixed=0.010191
>
> As the data shows, the existing cache's retrieval performance
> diminishes linearly with cache size. (Keep in mind the microbenchmark
> is testing 100% cache hit rate.)
>
> The new cache implementation reveals the following:
>
> n=4     init=0.006665 gets=0.006541 sets=0.005733 mixed=0.006876
> n=8     init=0.006649 gets=0.006374 sets=0.005663 mixed=0.006899
> n=16    init=0.006570 gets=0.006504 sets=0.005799 mixed=0.007057
> n=32    init=0.006854 gets=0.006459 sets=0.005747 mixed=0.007034
> n=64    init=0.006580 gets=0.006495 sets=0.005740 mixed=0.006992
> n=128   init=0.006534 gets=0.006739 sets=0.005648 mixed=0.007124
> n=256   init=0.006669 gets=0.006773 sets=0.005824 mixed=0.007151
> n=512   init=0.006701 gets=0.007061 sets=0.006042 mixed=0.007372
> n=1024  init=0.006641 gets=0.007620 sets=0.006387 mixed=0.007464
> n=2048  init=0.006517 gets=0.008598 sets=0.006871 mixed=0.008077
> n=4096  init=0.006720 gets=0.010933 sets=0.007854 mixed=0.008663
> n=8192  init=0.007383 gets=0.015969 sets=0.010288 mixed=0.008896
> n=16384 init=0.006660 gets=0.025447 sets=0.011208 mixed=0.008826
> n=32768 init=0.006658 gets=0.044390 sets=0.011192 mixed=0.008943
> n=65536 init=0.006836 gets=0.082736 sets=0.011151 mixed=0.008826
>
> Let's go through the results.
>
> The new cache takes longer to construct. ~6.6ms vs ~4.1ms. However,
> this is measuring 10,000 __init__ calls, so the difference is
> ~0.2us/instance. We currently only create lrucachedict for manifest
> instances, so this regression is not likely relevant.
>
> The new cache is slightly slower for retrievals for cache sizes
> < 1024. It's worth noting that the only existing use of lurcachedict
> is in manifest.py and the default cache size is 4. This regression
> is worrisome. However, for n=4, the delta is ~2.9s for 10,000 lookups,
> or ~0.29us/op. Again, this is a marginal regression and likely not
> relevant in the real world. Timing `hg log -p -l 100` for
> mozilla-central reveals that cache lookup times are dominated by
> decompression and fulltext resolution (even with lz4 manifests).
>
> The new cache is significantly faster for retrievals at larger
> capacities. Whereas the old implementation has retrieval performance
> linear with cache capacity, the new cache is constant time until much
> larger values. And, when it does start to increase significantly, it
> is a few magnitudes faster than the current cache.
>
> The new cache does appear to be slower for sets when capacity is large.
> However, performance is similar for smaller capacities. Of course,
> caches should generally be optimized for retrieval performance because
> if a cache is getting more sets than gets, it doesn't really make
> sense to cache. If this regression is worrisome, again, taking the
> largest regression at n=65536 of ~6.9ms for 10,000 results in a
> regression of ~0.68us/op. This is not significant in the grand scheme
> of things.
>
> Overall, the new cache is performant at retrievals at much larger
> capacity values which makes it a generally more useful cache backend.
> While there are regressions, their absolute value is extremely small.
> Since we aren't using lrucachedict aggressively today, these
> regressions should not be relevant. The improved scalability of
> lrucachedict should enable us to more aggressively utilize
> lrucachedict for more granular caching (read: higher capacity caches)
> in the near future. The impetus for this patch is to establish a cache
> of decompressed revlog revlog, notably manifest revisions. And since

revlog revlog? (I'm not sure what this should be)

> delta chains can grow to >10,000 and cache hit rate can be high, the
> improved retrieval performance of lrucachedict should be relevant.
>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -490,44 +490,188 @@ class sortdict(dict):
>          return self._list.__iter__()
>      def iteritems(self):
>          for k in self._list:
>              yield k, self[k]
>      def insert(self, index, key, val):
>          self._list.insert(index, key)
>          dict.__setitem__(self, key, val)
>
> +class _lrucachenode(object):
> +    """A node in a doubly linked list.
> +
> +    Holds a reference to nodes on either side as well as a key-value
> +    pair for the dictionary entry.
> +    """
> +    __slots__ = ('next', 'prev', 'key', 'value')
> +
> +    def __init__(self):
> +        self.next = None
> +        self.prev = None
> +
> +        self.key = _notset
> +        self.value = None
> +
> +    def markempty(self):
> +        """Mark the node as emptied."""
> +        self.key = _notset
> +
>  class lrucachedict(object):
> -    '''cache most recent gets from or sets to this dictionary'''
> -    def __init__(self, maxsize):
> +    """Dict that caches most recent accesses and sets.
> +
> +    The dict consists of an actual backing dict - indexed by original
> +    key - and a doubly linked circular list defining the order of entries in
> +    the cache.
> +
> +    The head node is the newest entry in the cache. If the cache is full,
> +    we recycle head.prev and make it the new head. Cache accesses result in
> +    the node being moved to before the existing head and being marked as the
> +    new head node.
> +
> +    NOTE: construction of this class doesn't scale well if the cache size
> +    is in the thousands. Avoid creating hundreds or thousands of instances
> +    with large capacities.
> +    """
> +    def __init__(self, max):
>          self._cache = {}
> -        self._maxsize = maxsize
> -        self._order = collections.deque()
>
> -    def __getitem__(self, key):
> -        value = self._cache[key]
> -        self._order.remove(key)
> -        self._order.append(key)
> -        return value
> +        self._head = head = _lrucachenode()
> +        head.prev = head
> +        head.next = head
> +        self._size = 1
> +        self._capacity = max
>
> -    def __setitem__(self, key, value):
> -        if key not in self._cache:
> -            if len(self._cache) >= self._maxsize:
> -                del self._cache[self._order.popleft()]
> +    def __len__(self):
> +        return len(self._cache)
> +
> +    def __contains__(self, k):
> +        return k in self._cache
> +
> +    def __iter__(self):
> +        # We don't have to iterate in cache order, but why not.
> +        n = self._head
> +        for i in range(len(self._cache)):
> +            yield n.key
> +            n = n.next
> +
> +    def __getitem__(self, k):
> +        node = self._cache[k]
> +        self._movetohead(node)
> +        return node.value
> +
> +    def __setitem__(self, k, v):
> +        node = self._cache.get(k)
> +        # Replace existing value and mark as newest.
> +        if node is not None:
> +            node.value = v
> +            self._movetohead(node)
> +            return
> +
> +        if self._size < self._capacity:
> +            node = self._addcapacity()
>          else:
> -            self._order.remove(key)
> -        self._cache[key] = value
> -        self._order.append(key)
> +            # Grab the last/oldest item.
> +            node = self._head.prev
>
> -    def __contains__(self, key):
> -        return key in self._cache
> +        # At capacity. Kill the old entry.
> +        if node.key is not _notset:
> +            del self._cache[node.key]
> +
> +        node.key = k
> +        node.value = v
> +        self._cache[k] = node
> +        # And mark it as newest entry. No need to adjust order since it
> +        # is already self._head.prev.
> +        self._head = node
> +
> +    def __delitem__(self, k):
> +        node = self._cache.pop(k)
> +        node.markempty()
> +
> +        # Temporarily mark as newest item before re-adjusting head to make
> +        # this node the oldest item.
> +        self._movetohead(node)
> +        self._head = node.next
> +
> +    # Additional dict methods.
> +
> +    def get(self, k, default=None):
> +        try:
> +            return self._cache[k]
> +        except KeyError:
> +            return default
>
>      def clear(self):
> +        n = self._head
> +        while n.key is not _notset:
> +            n.markempty()
> +            n = n.next
> +
>          self._cache.clear()
> -        self._order = collections.deque()
> +
> +    def _movetohead(self, node):
> +        """Mark a node as the newest, making it the new head.
> +
> +        When a node is accessed, it becomes the freshest entry in the LRU
> +        list, which is denoted by self._head.
> +
> +        Visually, let's make ``N`` the new head node (* denotes head):
> +
> +            previous/oldest <-> head <-> next/next newest
> +
> +            ----<->--- A* ---<->-----
> +            |                       |
> +            E <-> D <-> N <-> C <-> B
> +
> +        To:
> +
> +            ----<->--- N* ---<->-----
> +            |                       |
> +            E <-> D <-> C <-> B <-> A
> +
> +        This requires the following moves:
> +
> +           C.next = D  (node.prev.next = node.next)
> +           D.prev = C  (node.next.prev = node.prev)
> +           E.next = N  (head.prev.next = node)
> +           N.prev = E  (node.prev = head.prev)
> +           N.next = A  (node.next = head)
> +           A.prev = N  (head.prev = node)
> +        """
> +        head = self._head
> +        # C.next = D
> +        node.prev.next = node.next
> +        # D.prev = C
> +        node.next.prev = node.prev
> +        # N.prev = E
> +        node.prev = head.prev
> +        # N.next = A
> +        # It is tempting to do just "head" here, however if node is
> +        # adjacent to head, this will do bad things.
> +        node.next = head.prev.next
> +        # E.next = N
> +        node.next.prev = node
> +        # A.prev = N
> +        node.prev.next = node
> +
> +        self._head = node
> +
> +    def _addcapacity(self):
> +        """Add a node to the circular linked list.
> +
> +        The new node is inserted before the head node.
> +        """
> +        head = self._head
> +        node = _lrucachenode()
> +        head.prev.next = node
> +        node.prev = head.prev
> +        node.next = head
> +        head.prev = node
> +        self._size += 1
> +        return node
>
>  def lrucachefunc(func):
>      '''cache most recent results of function calls'''
>      cache = {}
>      order = collections.deque()
>      if func.func_code.co_argcount == 1:
>          def f(arg):
>              if arg not in cache:
> diff --git a/tests/test-lrucachedict.py b/tests/test-lrucachedict.py
> --- a/tests/test-lrucachedict.py
> +++ b/tests/test-lrucachedict.py
> @@ -29,10 +29,18 @@ def test_lrucachedict():
>
>      # 'e' should be dropped now
>      d['f'] = 'vf'
>      printifpresent(d, ['b', 'c', 'd', 'e', 'f'])
>
>      d.clear()
>      printifpresent(d, ['b', 'c', 'd', 'e', 'f'])
>
> +    # Now test dicts that aren't full.
> +    d = util.lrucachedict(4)
> +    d['a'] = 1
> +    d['b'] = 2
> +    d['a']
> +    d['b']
> +    printifpresent(d, ['a', 'b'])
> +
>  if __name__ == '__main__':
>      test_lrucachedict()
> diff --git a/tests/test-lrucachedict.py.out b/tests/test-lrucachedict.py.out
> --- a/tests/test-lrucachedict.py.out
> +++ b/tests/test-lrucachedict.py.out
> @@ -24,8 +24,12 @@ d['d']: vd
>  'e' in d: False
>  'f' in d: True
>  d['f']: vf
>  'b' in d: False
>  'c' in d: False
>  'd' in d: False
>  'e' in d: False
>  'f' in d: False
> +'a' in d: True
> +d['a']: 1
> +'b' in d: True
> +d['b']: 2
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel