[PATCH 3 of 3 V2] util: reimplement lrucachedict
Augie Fackler
raf at durin42.com
Mon Dec 7 16:56:13 CST 2015
On Sun, Dec 06, 2015 at 09:35:24PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc at gmail.com>
> # Date 1449457450 28800
> # Sun Dec 06 19:04:10 2015 -0800
> # Node ID afbaaeba682753d329975fd3fb85e3af44a653fe
> # Parent 07aa6d894c36f7a0aa37a1896ef1779e8e9763ab
> util: reimplement lrucachedict
I like it. Other than one commit message query I'm a fan.
>
> As part of attempting to more aggressively use the existing
> lrucachedict, collections.deque operations were frequently
> showing up in profiling output, negating benefits of caching.
>
> Searching the internet seems to tell me that the most efficient
> way to implement an LRU cache in Python is to have a dict indexing
> the cached entries and then to use a doubly linked list to track
> freshness of each entry. So, this patch replaces our existing
> lrucachedict with a version using such a pattern.
>
> The recently introduced perflrucachedict command reveals the
> following timings for 10,000 operations for the following cache
> sizes for the existing cache:
>
> n=4 init=0.004079 gets=0.003632 sets=0.005188 mixed=0.005402
> n=8 init=0.004045 gets=0.003998 sets=0.005064 mixed=0.005328
> n=16 init=0.004011 gets=0.004496 sets=0.005021 mixed=0.005555
> n=32 init=0.004064 gets=0.005611 sets=0.005188 mixed=0.006189
> n=64 init=0.003975 gets=0.007684 sets=0.005178 mixed=0.007245
> n=128 init=0.004121 gets=0.012005 sets=0.005422 mixed=0.009471
> n=256 init=0.004143 gets=0.020295 sets=0.005227 mixed=0.013612
> n=512 init=0.004039 gets=0.036703 sets=0.005243 mixed=0.020685
> n=1024 init=0.004193 gets=0.068142 sets=0.005251 mixed=0.033064
> n=2048 init=0.004070 gets=0.133383 sets=0.005160 mixed=0.050359
> n=4096 init=0.004053 gets=0.265194 sets=0.004868 mixed=0.048352
> n=8192 init=0.004087 gets=0.542218 sets=0.004562 mixed=0.032753
> n=16384 init=0.004106 gets=1.064055 sets=0.004179 mixed=0.020367
> n=32768 init=0.004034 gets=2.097620 sets=0.004260 mixed=0.013031
> n=65536 init=0.004108 gets=4.106390 sets=0.004268 mixed=0.010191
>
> As the data shows, the existing cache's retrieval performance
> diminishes linearly with cache size. (Keep in mind the microbenchmark
> is testing 100% cache hit rate.)
>
> The new cache implementation reveals the following:
>
> n=4 init=0.006665 gets=0.006541 sets=0.005733 mixed=0.006876
> n=8 init=0.006649 gets=0.006374 sets=0.005663 mixed=0.006899
> n=16 init=0.006570 gets=0.006504 sets=0.005799 mixed=0.007057
> n=32 init=0.006854 gets=0.006459 sets=0.005747 mixed=0.007034
> n=64 init=0.006580 gets=0.006495 sets=0.005740 mixed=0.006992
> n=128 init=0.006534 gets=0.006739 sets=0.005648 mixed=0.007124
> n=256 init=0.006669 gets=0.006773 sets=0.005824 mixed=0.007151
> n=512 init=0.006701 gets=0.007061 sets=0.006042 mixed=0.007372
> n=1024 init=0.006641 gets=0.007620 sets=0.006387 mixed=0.007464
> n=2048 init=0.006517 gets=0.008598 sets=0.006871 mixed=0.008077
> n=4096 init=0.006720 gets=0.010933 sets=0.007854 mixed=0.008663
> n=8192 init=0.007383 gets=0.015969 sets=0.010288 mixed=0.008896
> n=16384 init=0.006660 gets=0.025447 sets=0.011208 mixed=0.008826
> n=32768 init=0.006658 gets=0.044390 sets=0.011192 mixed=0.008943
> n=65536 init=0.006836 gets=0.082736 sets=0.011151 mixed=0.008826
>
> Let's go through the results.
>
> The new cache takes longer to construct. ~6.6ms vs ~4.1ms. However,
> this is measuring 10,000 __init__ calls, so the difference is
> ~0.2us/instance. We currently only create lrucachedict for manifest
> instances, so this regression is not likely relevant.
>
> The new cache is slightly slower for retrievals for cache sizes
> < 1024. It's worth noting that the only existing use of lurcachedict
> is in manifest.py and the default cache size is 4. This regression
> is worrisome. However, for n=4, the delta is ~2.9s for 10,000 lookups,
> or ~0.29us/op. Again, this is a marginal regression and likely not
> relevant in the real world. Timing `hg log -p -l 100` for
> mozilla-central reveals that cache lookup times are dominated by
> decompression and fulltext resolution (even with lz4 manifests).
>
> The new cache is significantly faster for retrievals at larger
> capacities. Whereas the old implementation has retrieval performance
> linear with cache capacity, the new cache is constant time until much
> larger values. And, when it does start to increase significantly, it
> is a few magnitudes faster than the current cache.
>
> The new cache does appear to be slower for sets when capacity is large.
> However, performance is similar for smaller capacities. Of course,
> caches should generally be optimized for retrieval performance because
> if a cache is getting more sets than gets, it doesn't really make
> sense to cache. If this regression is worrisome, again, taking the
> largest regression at n=65536 of ~6.9ms for 10,000 results in a
> regression of ~0.68us/op. This is not significant in the grand scheme
> of things.
>
> Overall, the new cache is performant at retrievals at much larger
> capacity values which makes it a generally more useful cache backend.
> While there are regressions, their absolute value is extremely small.
> Since we aren't using lrucachedict aggressively today, these
> regressions should not be relevant. The improved scalability of
> lrucachedict should enable us to more aggressively utilize
> lrucachedict for more granular caching (read: higher capacity caches)
> in the near future. The impetus for this patch is to establish a cache
> of decompressed revlog revlog, notably manifest revisions. And since
revlog revlog? (I'm not sure what this should be)
> delta chains can grow to >10,000 and cache hit rate can be high, the
> improved retrieval performance of lrucachedict should be relevant.
>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -490,44 +490,188 @@ class sortdict(dict):
> return self._list.__iter__()
> def iteritems(self):
> for k in self._list:
> yield k, self[k]
> def insert(self, index, key, val):
> self._list.insert(index, key)
> dict.__setitem__(self, key, val)
>
> +class _lrucachenode(object):
> + """A node in a doubly linked list.
> +
> + Holds a reference to nodes on either side as well as a key-value
> + pair for the dictionary entry.
> + """
> + __slots__ = ('next', 'prev', 'key', 'value')
> +
> + def __init__(self):
> + self.next = None
> + self.prev = None
> +
> + self.key = _notset
> + self.value = None
> +
> + def markempty(self):
> + """Mark the node as emptied."""
> + self.key = _notset
> +
> class lrucachedict(object):
> - '''cache most recent gets from or sets to this dictionary'''
> - def __init__(self, maxsize):
> + """Dict that caches most recent accesses and sets.
> +
> + The dict consists of an actual backing dict - indexed by original
> + key - and a doubly linked circular list defining the order of entries in
> + the cache.
> +
> + The head node is the newest entry in the cache. If the cache is full,
> + we recycle head.prev and make it the new head. Cache accesses result in
> + the node being moved to before the existing head and being marked as the
> + new head node.
> +
> + NOTE: construction of this class doesn't scale well if the cache size
> + is in the thousands. Avoid creating hundreds or thousands of instances
> + with large capacities.
> + """
> + def __init__(self, max):
> self._cache = {}
> - self._maxsize = maxsize
> - self._order = collections.deque()
>
> - def __getitem__(self, key):
> - value = self._cache[key]
> - self._order.remove(key)
> - self._order.append(key)
> - return value
> + self._head = head = _lrucachenode()
> + head.prev = head
> + head.next = head
> + self._size = 1
> + self._capacity = max
>
> - def __setitem__(self, key, value):
> - if key not in self._cache:
> - if len(self._cache) >= self._maxsize:
> - del self._cache[self._order.popleft()]
> + def __len__(self):
> + return len(self._cache)
> +
> + def __contains__(self, k):
> + return k in self._cache
> +
> + def __iter__(self):
> + # We don't have to iterate in cache order, but why not.
> + n = self._head
> + for i in range(len(self._cache)):
> + yield n.key
> + n = n.next
> +
> + def __getitem__(self, k):
> + node = self._cache[k]
> + self._movetohead(node)
> + return node.value
> +
> + def __setitem__(self, k, v):
> + node = self._cache.get(k)
> + # Replace existing value and mark as newest.
> + if node is not None:
> + node.value = v
> + self._movetohead(node)
> + return
> +
> + if self._size < self._capacity:
> + node = self._addcapacity()
> else:
> - self._order.remove(key)
> - self._cache[key] = value
> - self._order.append(key)
> + # Grab the last/oldest item.
> + node = self._head.prev
>
> - def __contains__(self, key):
> - return key in self._cache
> + # At capacity. Kill the old entry.
> + if node.key is not _notset:
> + del self._cache[node.key]
> +
> + node.key = k
> + node.value = v
> + self._cache[k] = node
> + # And mark it as newest entry. No need to adjust order since it
> + # is already self._head.prev.
> + self._head = node
> +
> + def __delitem__(self, k):
> + node = self._cache.pop(k)
> + node.markempty()
> +
> + # Temporarily mark as newest item before re-adjusting head to make
> + # this node the oldest item.
> + self._movetohead(node)
> + self._head = node.next
> +
> + # Additional dict methods.
> +
> + def get(self, k, default=None):
> + try:
> + return self._cache[k]
> + except KeyError:
> + return default
>
> def clear(self):
> + n = self._head
> + while n.key is not _notset:
> + n.markempty()
> + n = n.next
> +
> self._cache.clear()
> - self._order = collections.deque()
> +
> + def _movetohead(self, node):
> + """Mark a node as the newest, making it the new head.
> +
> + When a node is accessed, it becomes the freshest entry in the LRU
> + list, which is denoted by self._head.
> +
> + Visually, let's make ``N`` the new head node (* denotes head):
> +
> + previous/oldest <-> head <-> next/next newest
> +
> + ----<->--- A* ---<->-----
> + | |
> + E <-> D <-> N <-> C <-> B
> +
> + To:
> +
> + ----<->--- N* ---<->-----
> + | |
> + E <-> D <-> C <-> B <-> A
> +
> + This requires the following moves:
> +
> + C.next = D (node.prev.next = node.next)
> + D.prev = C (node.next.prev = node.prev)
> + E.next = N (head.prev.next = node)
> + N.prev = E (node.prev = head.prev)
> + N.next = A (node.next = head)
> + A.prev = N (head.prev = node)
> + """
> + head = self._head
> + # C.next = D
> + node.prev.next = node.next
> + # D.prev = C
> + node.next.prev = node.prev
> + # N.prev = E
> + node.prev = head.prev
> + # N.next = A
> + # It is tempting to do just "head" here, however if node is
> + # adjacent to head, this will do bad things.
> + node.next = head.prev.next
> + # E.next = N
> + node.next.prev = node
> + # A.prev = N
> + node.prev.next = node
> +
> + self._head = node
> +
> + def _addcapacity(self):
> + """Add a node to the circular linked list.
> +
> + The new node is inserted before the head node.
> + """
> + head = self._head
> + node = _lrucachenode()
> + head.prev.next = node
> + node.prev = head.prev
> + node.next = head
> + head.prev = node
> + self._size += 1
> + return node
>
> def lrucachefunc(func):
> '''cache most recent results of function calls'''
> cache = {}
> order = collections.deque()
> if func.func_code.co_argcount == 1:
> def f(arg):
> if arg not in cache:
> diff --git a/tests/test-lrucachedict.py b/tests/test-lrucachedict.py
> --- a/tests/test-lrucachedict.py
> +++ b/tests/test-lrucachedict.py
> @@ -29,10 +29,18 @@ def test_lrucachedict():
>
> # 'e' should be dropped now
> d['f'] = 'vf'
> printifpresent(d, ['b', 'c', 'd', 'e', 'f'])
>
> d.clear()
> printifpresent(d, ['b', 'c', 'd', 'e', 'f'])
>
> + # Now test dicts that aren't full.
> + d = util.lrucachedict(4)
> + d['a'] = 1
> + d['b'] = 2
> + d['a']
> + d['b']
> + printifpresent(d, ['a', 'b'])
> +
> if __name__ == '__main__':
> test_lrucachedict()
> diff --git a/tests/test-lrucachedict.py.out b/tests/test-lrucachedict.py.out
> --- a/tests/test-lrucachedict.py.out
> +++ b/tests/test-lrucachedict.py.out
> @@ -24,8 +24,12 @@ d['d']: vd
> 'e' in d: False
> 'f' in d: True
> d['f']: vf
> 'b' in d: False
> 'c' in d: False
> 'd' in d: False
> 'e' in d: False
> 'f' in d: False
> +'a' in d: True
> +d['a']: 1
> +'b' in d: True
> +d['b']: 2
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
More information about the Mercurial-devel
mailing list