[PATCH 1 of 6 RFC] radixlink: introduce a new data structure

Denis Laxalde denis.laxalde at logilab.fr
Mon May 22 03:37:47 EDT 2017


Jun Wu a écrit :
> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1495396237 25200
> #      Sun May 21 12:50:37 2017 -0700
> # Node ID fec85b1af16b0360e7bd78cd26d4d21edf678962
> # Parent  4e51b2a99847904f1cc5a9c74784f19d69e829d0
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r fec85b1af16b
> radixlink: introduce a new data structure
>
> The radixlink structure is designed to address several hard-to-solve
> performance issues in Mercurial. It is an on-disk radix-tree as index,
> linking to linked list as data (so it could store multiple values per key).
>
> For long, core Mercurial has no real key-value data structure, and various
> operations have suboptimal time complexity:
>
>   1. changelog.rev(node): O(len(repo))
>   2. obsstore.precursors[node]: O(len(markers))
>   3. adjustlinkrev slow path (a complex problem)
>
> The new data structure could make the first two O(len(node)), and provide a
> cache for linkrevs where the keys could be "filenode + path + '\0'".
>
> The next patches will try to solve the obsstore performance issue.
>

> diff --git a/tests/test-radixlink.py b/tests/test-radixlink.py
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-radixlink.py
> @@ -0,0 +1,61 @@
> +from __future__ import absolute_import
> +
> +import os
> +import random
> +
> +from mercurial import (
> +    error,
> +    radixlink,
> +    vfs,
> +)
> +
> +randint = random.randint
> +
> +def ensure(expr):
> +    if not expr:
> +        raise RuntimeError, 'unexpected'

I think this is a syntax error in Python 3 and, even in Python 2, it's
really odd nowadays.

Besides it's not clear to me what's the value of issuing a RuntimeError
over an AssertionError that's more usual in tests. (I even wonder if
unittest's assertEqual wouldn't be more appropriate/useful in cases
value comparison is involved).

> +
> +fs = vfs.vfs(os.environ['TESTTMP'])
> +
> +rl1 = radixlink.radixlink(fs, 'rl-test')
> +rl1.destroy()
> +
> +# basic functions
> +ensure(rl1.get('notfound') == [])
> +ensure(rl1.truthoffset == 0)
> +
> +# prefix key detection
> +rl1.insert('abc', 3)
> +for buggykey in ['abcdef', 'ab']:
> +    try:
> +        rl1.insert('abcdef', 3)
> +        raise RuntimeError, 'k1 being prefix of k2 should not be allowed'
> +    except error.ProgrammingError:
> +        pass
> +rl1.destroy()
> +
> +# insert random data
> +def urandom(n):
> +    s = bytearray(n + 1)
> +    for i in xrange(n):
> +        s[i] = randint(0, 254)
> +    s[n] = 255
> +    return s
> +
> +d = {}
> +for i in range(10000):
> +    k = bytes(urandom(randint(0, 20)))
> +    v = randint(0, 4294967295)
> +    rl1.insert(k, v)
> +    d.setdefault(k, []).insert(0, v)
> +    v2 = rl1.get(k)
> +    ensure(v2[0] == v)
> +
> +rl1.truthoffset = 333
> +rl1.flush()
> +
> +# reload and verify against d
> +rl2 = radixlink.radixlink(fs, 'rl-test')
> +ensure(rl2.truthoffset == 333)
> +for k, vs in d.items():
> +    ensure(rl2.get(k) == vs)


More information about the Mercurial-devel mailing list