[PATCH 1 of 6 RFC] radixlink: introduce a new data structure
Denis Laxalde
denis.laxalde at logilab.fr
Mon May 22 03:37:47 EDT 2017
Jun Wu a écrit :
> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1495396237 25200
> # Sun May 21 12:50:37 2017 -0700
> # Node ID fec85b1af16b0360e7bd78cd26d4d21edf678962
> # Parent 4e51b2a99847904f1cc5a9c74784f19d69e829d0
> # Available At https://bitbucket.org/quark-zju/hg-draft
> # hg pull https://bitbucket.org/quark-zju/hg-draft -r fec85b1af16b
> radixlink: introduce a new data structure
>
> The radixlink structure is designed to address several hard-to-solve
> performance issues in Mercurial. It is an on-disk radix-tree as index,
> linking to linked list as data (so it could store multiple values per key).
>
> For long, core Mercurial has no real key-value data structure, and various
> operations have suboptimal time complexity:
>
> 1. changelog.rev(node): O(len(repo))
> 2. obsstore.precursors[node]: O(len(markers))
> 3. adjustlinkrev slow path (a complex problem)
>
> The new data structure could make the first two O(len(node)), and provide a
> cache for linkrevs where the keys could be "filenode + path + '\0'".
>
> The next patches will try to solve the obsstore performance issue.
>
> diff --git a/tests/test-radixlink.py b/tests/test-radixlink.py
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-radixlink.py
> @@ -0,0 +1,61 @@
> +from __future__ import absolute_import
> +
> +import os
> +import random
> +
> +from mercurial import (
> + error,
> + radixlink,
> + vfs,
> +)
> +
> +randint = random.randint
> +
> +def ensure(expr):
> + if not expr:
> + raise RuntimeError, 'unexpected'
I think this is a syntax error in Python 3 and, even in Python 2, it's
really odd nowadays.
Besides it's not clear to me what's the value of issuing a RuntimeError
over an AssertionError that's more usual in tests. (I even wonder if
unittest's assertEqual wouldn't be more appropriate/useful in cases
value comparison is involved).
> +
> +fs = vfs.vfs(os.environ['TESTTMP'])
> +
> +rl1 = radixlink.radixlink(fs, 'rl-test')
> +rl1.destroy()
> +
> +# basic functions
> +ensure(rl1.get('notfound') == [])
> +ensure(rl1.truthoffset == 0)
> +
> +# prefix key detection
> +rl1.insert('abc', 3)
> +for buggykey in ['abcdef', 'ab']:
> + try:
> + rl1.insert('abcdef', 3)
> + raise RuntimeError, 'k1 being prefix of k2 should not be allowed'
> + except error.ProgrammingError:
> + pass
> +rl1.destroy()
> +
> +# insert random data
> +def urandom(n):
> + s = bytearray(n + 1)
> + for i in xrange(n):
> + s[i] = randint(0, 254)
> + s[n] = 255
> + return s
> +
> +d = {}
> +for i in range(10000):
> + k = bytes(urandom(randint(0, 20)))
> + v = randint(0, 4294967295)
> + rl1.insert(k, v)
> + d.setdefault(k, []).insert(0, v)
> + v2 = rl1.get(k)
> + ensure(v2[0] == v)
> +
> +rl1.truthoffset = 333
> +rl1.flush()
> +
> +# reload and verify against d
> +rl2 = radixlink.radixlink(fs, 'rl-test')
> +ensure(rl2.truthoffset == 333)
> +for k, vs in d.items():
> + ensure(rl2.get(k) == vs)
More information about the Mercurial-devel
mailing list