[PATCH 1 of 5] util: add an LRU cache dict

Fri Feb 8 16:38:43 CST 2013

On 08 February 2013, Siddharth Agarwal said:
> # HG changeset patch
> # User Siddharth Agarwal <sid0 at fb.com>
> # Date 1360351263 0
> # Node ID 642bf4f92100a41182de15646e37db70b391bfeb
> # Parent  08e00496e7b3bda8db3fbe7084a013d77e4932d2
> util: add an LRU cache dict
> 
> This is the bare minimum dictionary implementation needed for an upcoming
> patch.

Tests? This should be doc-test-able.

> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -211,6 +211,30 @@ except AttributeError:
>                      del self[i]
>                      break
>  
> +class lrucachedict(object):
> +    '''cache most recent get or sets to this dictionary'''
> +    def __init__(self, size):
> +        self._cache = {}
> +        self._size = {}

It's a dict...

> +        self._order = deque()
> +
> +    def __getitem__(self, key):
> +        self._order.remove(key)
> +        self._order.append(key)
> +        return self._cache[key]
> +
> +    def __setitem__(self, key, value):
> +        if key not in self._cache:
> +            if len(self._cache) > self._size:

...but you're comparing it to an int???

Also, wouldn't _maxsize be a better name?

Is dequeue the right data structure here? I recently implemented an
LRU cache using heapq to record the "fetch time" of each entry.
__getitem__() incremented the age of the whole cache, counted in
fetches, and set the fetch time of the fetched key to 0. So with every
fetch, every other key becomes a bit older. I *think* that adds O(1)
overhead to __getitem__().  

Then __setitem__() can lookup the oldest key in O(lg N) time when the
cache is full.

How does that compare to dequeue? What about memory use?

       Greg
-- 
Greg Ward                            http://www.gerg.ca
<greg at gerg.ca>                       @gergdotca