[PATCH] parsers: use buffer to store revlog index

Gregory Szorc gregory.szorc at gmail.com
Tue Dec 13 01:11:02 EST 2016


On Tue, Dec 6, 2016 at 3:46 AM, Jun Wu <quark at fb.com> wrote:

> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1481024689 0
> #      Tue Dec 06 11:44:49 2016 +0000
> # Node ID 36b4288ea10a9a76c1d993d47e002baaf7705ffb
> # Parent  2463b16bbd3710b619e0e948651db9948341f990
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> 36b4288ea10a
> parsers: use buffer to store revlog index
>

I know this is queued already, but I think C code should have more eyes
than Python code.

This patch LGTM. Only thing I would have done differently is store a
Py_buffer* instead of a Py_buffer to ensure alignment of indexObject struct
entries. But I don't think this struct is that performance sensitive to
care. Feel free to address in a follow-up if you disagree.


>
> Previously, the revlog index passed to parse_index2 must be a "string",
> which means we have to read the whole revlog index into memory. This patch
> makes the code accept a generic Py_buffer, to be more flexible - it could
> be
> a "string", or anything that implements the buffer interface, like a
> mmap-ed
> region.
>
> Note: ideally we want to remove the "data" field. However, it is still used
> in parse_index2:
>
>     if (idx->inlined) {
>         cache = Py_BuildValue("iO", 0, idx->data);
>         ....
>     }
>     ....
>     tuple = Py_BuildValue("NN", idx, cache);
>     ....
>     return tuple;
>
> Its only users are revlogio.parseindex and revlog.__init__:
>
>     # revlogio.parseindex
>     index, cache = parsers.parse_index2(data, inline)
>     return index, getattr(index, 'nodemap', None), cache
>
>     # revlog.__init__
>     d = self._io.parseindex(indexdata, self._inline)
>     self.index, nodemap, self._chunkcache = d
>
> Maybe we could move the logic (testing inline and returnning "data" object)
> to revlog.py. But that should be a separate patch.
>
> diff --git a/mercurial/parsers.c b/mercurial/parsers.c
> --- a/mercurial/parsers.c
> +++ b/mercurial/parsers.c
> @@ -754,4 +754,5 @@ typedef struct {
>         /* Type-specific fields go here. */
>         PyObject *data;        /* raw bytes of index */
> +       Py_buffer buf;         /* buffer of data */
>         PyObject **cache;      /* cached tuples */
>         const char **offsets;  /* populated on demand */
> @@ -809,5 +810,5 @@ static const char *index_deref(indexObje
>         }
>
> -       return PyBytes_AS_STRING(self->data) + pos * v1_hdrsize;
> +       return (const char *)(self->buf.buf) + pos * v1_hdrsize;
>  }
>
> @@ -2390,7 +2391,7 @@ static int index_assign_subscript(indexO
>  static Py_ssize_t inline_scan(indexObject *self, const char **offsets)
>  {
> -       const char *data = PyBytes_AS_STRING(self->data);
> +       const char *data = (const char *)self->buf.buf;
>         Py_ssize_t pos = 0;
> -       Py_ssize_t end = PyBytes_GET_SIZE(self->data);
> +       Py_ssize_t end = self->buf.len;
>         long incr = v1_hdrsize;
>         Py_ssize_t len = 0;
> @@ -2426,4 +2427,5 @@ static int index_init(indexObject *self,
>         self->cache = NULL;
>         self->data = NULL;
> +       memset(&self->buf, 0, sizeof(self->buf));
>         self->headrevs = NULL;
>         self->filteredrevs = Py_None;
> @@ -2434,9 +2436,13 @@ static int index_init(indexObject *self,
>         if (!PyArg_ParseTuple(args, "OO", &data_obj, &inlined_obj))
>                 return -1;
> -       if (!PyBytes_Check(data_obj)) {
> -               PyErr_SetString(PyExc_TypeError, "data is not a string");
> +       if (!PyObject_CheckBuffer(data_obj)) {
> +               PyErr_SetString(PyExc_TypeError,
> +                               "data does not support buffer interface");
>                 return -1;
>         }
> -       size = PyBytes_GET_SIZE(data_obj);
> +
> +       if (PyObject_GetBuffer(data_obj, &self->buf, PyBUF_SIMPLE) == -1)
> +               return -1;
> +       size = self->buf.len;
>
>         self->inlined = inlined_obj && PyObject_IsTrue(inlined_obj);
> @@ -2479,4 +2485,8 @@ static void index_dealloc(indexObject *s
>         _index_clearcaches(self);
>         Py_XDECREF(self->filteredrevs);
> +       if (self->buf.buf) {
> +               PyBuffer_Release(&self->buf);
> +               memset(&self->buf, 0, sizeof(self->buf));
> +       }
>         Py_XDECREF(self->data);
>         Py_XDECREF(self->added);
> @@ -2578,5 +2588,6 @@ static PyTypeObject indexType = {
>   *
>   * index: an index object that lazily parses RevlogNG records
> - * cache: if data is inlined, a tuple (index_file_content, 0), else None
> + * cache: if data is inlined, a tuple (0, index_file_content), else None
> + *        index_file_content could be a string, or a buffer
>   *
>   * added complications are for backwards compatibility
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20161212/b83d16cb/attachment.html>


More information about the Mercurial-devel mailing list