[PATCH 4 of 9 V6] bookmarks: introduce binary encoding
Stanislau Hlebik
stash at fb.com
Wed Nov 2 10:08:06 UTC 2016
If we have node+size first what advantage does it give?
Iâm using empty node to show that bookmark is deleted.
I can use nullid for the same purpose but then I wonât be able to push/pull bookmarks that point to null commit
(I know this is a rare case, but itâs possible to do it now).
> The part about 'Hex' is surprising is it still up to date.
Why itâs not up to date? Itâs possible to pass bookmarks with hex nodes.
On 10/14/16, 2:28 AM, "Pierre-Yves David" <pierre-yves.david at ens-lyon.org> wrote:
On 10/11/2016 06:25 PM, Stanislau Hlebik wrote:
> # HG changeset patch
> # User Stanislau Hlebik <stash at fb.com>
> # Date 1476195835 25200
> # Tue Oct 11 07:23:55 2016 -0700
> # Node ID 718ed86a3698631077a087efaf668d70513056f5
> # Parent 6f5a3300a5457c92eb09170a30c98328ebe3bcce
> bookmarks: introduce binary encoding
>
> Bookmarks binary encoding will be used for `bookmarks` bundle2 part.
> The format is: <4 bytes - bookmark size, big-endian><bookmark>
> <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
CCing greg as our binary format overlord,
I'm not sure how useful it is to have the node to be optional. using
nullid should be fine (and most stream/bundle will be compressed anyway.
This would give us the option to have the node+size first (as fixed
width) then the bookmark name. Though ?
I do not thing it is necessary to point that the value is big-endian, we
use big endian for all binary encoding.
Beside that, the change seems good. I've been tempted to advise for
moving the encoding/decoding directly into the bundle2 part to benefit
from the bundle2 utility to pack/unpack but it eventually felt more at
home here.
I've some small comment inline for your consideration.
> BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is
> incorrect.
>
> diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py
> --- a/mercurial/bookmarks.py
> +++ b/mercurial/bookmarks.py
> @@ -7,8 +7,10 @@
>
> from __future__ import absolute_import
>
> +import StringIO
> import errno
> import os
> +import struct
>
> from .i18n import _
> from .node import (
> @@ -23,6 +25,77 @@
> util,
> )
>
> +_NONEMPTYNODE = chr(1)
> +_EMPTYNODE = chr(0)
I would use 'pack' directly here. Who know what awe-full thing around
char and str python might be preparing for pytnon3.
> +
> +def _packbookmarksize(size):
> + return struct.pack('>i', size)
Any reason we don't just inline this?
> +
> +def _unpackbookmarksize(stream):
> + """Returns 0 if stream is empty.
> + """
> +
> + expectedsize = struct.calcsize('>i')
> + encodedbookmarksize = stream.read(expectedsize)
> + if len(encodedbookmarksize) == 0:
> + return 0
> + if len(encodedbookmarksize) != expectedsize:
> + raise error.BookmarksDecodeError(
> + _('cannot decode bookmark size: '
> + 'expected size: %d, actual size: %d') %
> + (expectedsize, len(encodedbookmarksize)))
> + return struct.unpack('>i', encodedbookmarksize)[0]
small nits, The bundle2 code have a lot of utility around reading and
unpacking, it seems like it could be worthwhile to move this unpacking
code directly in the part.
> +def encodebookmarks(bookmarks):
> + """Encodes bookmark to node mapping to the byte string.
> +
> + Format: <4 bytes - bookmark size, big-endian><bookmark>
> + <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
> + Node may be 20 byte binary string, 40 byte hex string or empty
The part about 'Hex' is surprising is it still up to date.
> + parts = []
> + for bookmark, node in bookmarks.iteritems():
> + encodedbookmarksize = _packbookmarksize(len(bookmark))
> + parts.append(encodedbookmarksize)
> + bookmark = encoding.fromlocal(bookmark)
> + parts.append(bookmark)
> + if node:
> + if len(node) != 20 and len(node) != 40:
> + raise error.BookmarksEncodeError()
> + if len(node) == 40:
> + node = bin(node)
> + parts.append(_NONEMPTYNODE)
> + parts.append(node)
> + else:
> + parts.append(_EMPTYNODE)
> + return ''.join(parts)
We should probably have something a bit more subtle than 1 unique byte
blob. bundle2 part can be fed an iterator yielding the lines one by one
would be a good way to have this generation smoother.
> +
> +def decodebookmarks(buf):
> + """Decodes buffer into bookmark to node mapping.
> +
> + Node is either 20 bytes or empty.
> + """
> +
> + stream = StringIO.StringIO(buf)
> + bookmarks = {}
> + while True:
> + bookmarksize = _unpackbookmarksize(stream)
> + if not bookmarksize:
> + break
Could we use 'while bookmark size' as an iteration value. A bare 'while
True:' is a bit scary.
> + bookmark = stream.read(bookmarksize)
> + if len(bookmark) != bookmarksize:
> + raise error.BookmarksDecodeError(
> + 'cannot decode bookmark: expected size: %d, '
> + 'actual size: %d' % (bookmarksize, len(bookmark)))
> + bookmark = encoding.tolocal(bookmark)
> + emptynode = stream.read(1)
> + node = ''
> + if emptynode != _EMPTYNODE:
> + node = stream.read(20)
> + bookmarks[bookmark] = node
> + return bookmarks
> +
> def _getbkfile(repo):
> """Hook so that extensions that mess with the store can hook bm storage.
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DQICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=1DxDEGgOTUUOmuCXmZ95OkWoyOXz2D7ILnHuyeqI0iQ&s=-N8JIBF1kxrGeXpr19jfoLFYJFS2Oj2MSZEFpUyjhOE&e=
>
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list