[PATCH 1 of 3 V2] util: add method to hash nested combination of python data structures

Yuya Nishihara yuya at tcha.org
Tue Jul 7 07:54:49 CDT 2015


On Mon, 6 Jul 2015 11:42:40 -0700, Laurent Charignon wrote:
> # HG changeset patch
> # User Laurent Charignon <lcharignon at fb.com>
> # Date 1435794507 25200
> #      Wed Jul 01 16:48:27 2015 -0700
> # Node ID 38cd2c6265855f0a8e65e02e2cc181921df498ca
> # Parent  2748bf78a5bf610da4f2d90fd1eea19a3b360c04
> util: add method to hash nested combination of python data structures
> 
> The goal of this series of patches is to have a method to compute the hash of
> config objects. This will enable restarting the command server when the config
> change.

I'm curious who and how the command server will be restarted. This patch seems
to imply that we can't rely on the mtime of the config files.

> +def makehash(x):
> +    """makes a hash from a basic data type containing only other hashable types
> +
> +    >>> a = makehash({3:2, 4:5})
> +    >>> b = makehash({4:5, 3:2})
> +    >>> c = makehash({4:5, 3:(2,3,4)})
> +    >>> d = makehash({4:5, 3:[2,3,4]})
> +    >>> e = makehash({4:5, 3:{2:[3,4]}})
> +    >>> a == b
> +    True
> +    >>> c == d
> +    True
> +    >>> e != a and c != a
> +    True
> +    >>> x = sortdict({1:2})
> +    >>> h1 = makehash(x)
> +    >>> x[4] = 5
> +    >>> h2 = makehash(x)
> +    >>> x[1] = 2
> +    >>> h3 = makehash(x)
> +    >>> h3 != h2 and h2 != h1
> +    True
> +    >>> s1 = set(list(range(2000)))
> +    >>> s2 = set(list(range(1999,-1,-1)))
> +    >>> makehash(s1) == makehash(s2)
> +    True
> +    """
> +    if isinstance(x, sortdict):
> +        # creates a list, in order to freeze the order of the element and avoid
> +        # sorting them before hashing (see below)
> +        return makehash(list(x.iteritems()))
> +    elif isinstance(x, dict):
> +        # iteritems for a dict does not always return in the same order, thus
> +        # the sorting
> +        return makehash(sorted(x.iteritems()))
> +    elif isinstance(x, (tuple, list)):
> +        return hash(tuple(makehash(i) for i in x))
> +    elif safehasattr(x, '__iter__'):
> +        # this is the case for sets and other containers
> +        # for two sets with the same content, __iter__ will not always return
> +        # the same order, thus the need to sort
> +        return hash(tuple(makehash(i) for i in sorted(x)))
> +    return hash(x)

If config object has no dynamically created functions or classes [1], we could
use cPickle.dumps(). But I guess the ui would have unpickable objects in some
cases.

% python -m timeit -s 'import cPickle; from mercurial import ui; \
d = ui.ui()._ucfg._data' 'hash(cPickle.dumps(d))'
1000 loops, best of 3: 424 usec per loop

% python -m timeit -s 'from mercurial import ui, util; \
d = ui.ui()._ucfg._data' 'util.makehash(d)'
1000 loops, best of 3: 843 usec per loop

 [1]: https://docs.python.org/2.7/library/pickle.html#what-can-be-pickled-and-unpickled


More information about the Mercurial-devel mailing list