[PATCH 1 of 6 RFC] radixlink: introduce a new data structure
Augie Fackler
raf at durin42.com
Wed May 24 17:11:51 EDT 2017
On Sun, May 21, 2017 at 06:31:08PM -0700, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1495396237 25200
> # Sun May 21 12:50:37 2017 -0700
> # Node ID fec85b1af16b0360e7bd78cd26d4d21edf678962
> # Parent 4e51b2a99847904f1cc5a9c74784f19d69e829d0
> # Available At https://bitbucket.org/quark-zju/hg-draft
> # hg pull https://bitbucket.org/quark-zju/hg-draft -r fec85b1af16b
> radixlink: introduce a new data structure
Clever. I like it.
>
> The radixlink structure is designed to address several hard-to-solve
> performance issues in Mercurial. It is an on-disk radix-tree as index,
> linking to linked list as data (so it could store multiple values per key).
>
> For long, core Mercurial has no real key-value data structure, and various
> operations have suboptimal time complexity:
>
> 1. changelog.rev(node): O(len(repo))
> 2. obsstore.precursors[node]: O(len(markers))
> 3. adjustlinkrev slow path (a complex problem)
>
> The new data structure could make the first two O(len(node)), and provide a
> cache for linkrevs where the keys could be "filenode + path + '\0'".
>
> The next patches will try to solve the obsstore performance issue.
>
> diff --git a/mercurial/radixlink.py b/mercurial/radixlink.py
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/radixlink.py
> @@ -0,0 +1,244 @@
> +# radixlink.py - on-disk radixtree index pointing to linked list data
> +#
> +# Copyright 2017 Facebook, Inc.
> +#
> +# This software may be used and distributed according to the terms of the
> +# GNU General Public License version 2 or any later version.
> +from __future__ import absolute_import
> +
> +import os
> +import struct
> +
> +from . import (
> + error,
> +)
> +
> +def _enlarge(buf, size):
> + """enlarge a bytearray to at least given size"""
> + l = len(buf)
> + if l < size:
> + buf += bytearray(size - l)
> +
> +def _tob16(buf):
> + """convert buffer to base16 bytearray"""
> + result = bytearray(len(buf) * 2)
> + for i, b in enumerate(bytearray(buf)):
> + result[i * 2] = b >> 4
> + result[i * 2 + 1] = b & 0xf
> + return result
> +
> +class radixlink(object):
> + """multimap with O(len(key) + len([value])) lookup and O(len(key)) insertion
> +
> + A "radixlink" structure consists of 2 files: indexfile, and linkfile.
> +
> + The indexfile is the radix tree with pointers to linkfile:
> +
> + indexdata := index-header + index-entry-list
> + index-header := source of truth offset (8B)
does "source of truth offset" mean "number of bytes in the source of
truth file as of when this index was valid"?
> + index-entry-list := '' | index-entry-list + index-entry
> + index-entry := radix-entry | leaf-entry
> + radix-entry := '\0' + index-offset (4B) * 16
> + leaf-entry := '\1' + link-offset (4B) + key-length (2B) + key
[...]
> + @property
> + def truthoffset(self):
> + """position of the append-only source of truth the index contains"""
> + if len(self.indexdata) >= self.indexheader.size:
> + return self.indexheader.unpack_from(self.indexdata)[0]
> + return 0
> +
> + @truthoffset.setter
> + def truthoffset(self, newoffset):
> + if self.truthoffset == newoffset:
> + return
> + self.indexheader.pack_into(self.indexdata, 0, newoffset)
> + self._dirty = True
> diff --git a/tests/test-radixlink.py b/tests/test-radixlink.py
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-radixlink.py
If we're gonna have a new test, let's use our silenttestrunner instead
of .py and .out files. It's a lot easier for newcomers to
understand. :)
> @@ -0,0 +1,61 @@
> +from __future__ import absolute_import
> +
> +import os
> +import random
> +
> +from mercurial import (
> + error,
> + radixlink,
> + vfs,
> +)
> +
> +randint = random.randint
> +
> +def ensure(expr):
> + if not expr:
> + raise RuntimeError, 'unexpected'
nit: `raise RuntimeError('unexpected')` is a more idiomatic spelling
(does that not work for some reason?)
> +
> +fs = vfs.vfs(os.environ['TESTTMP'])
More information about the Mercurial-devel
mailing list