[PATCH 4 of 9 V6] bookmarks: introduce binary encoding

Stanislau Hlebik stash at fb.com
Wed Nov 2 10:08:06 UTC 2016


If we have node+size first what advantage does it give?
I’m using empty node to show that bookmark is deleted.
I can use nullid for the same purpose but then I won’t be able to push/pull bookmarks that point to null commit
(I know this is a rare case, but it’s possible to do it now).

    >   The part about 'Hex' is surprising is it still up to date.
Why it’s not up to date? It’s possible to pass bookmarks with hex nodes.

On 10/14/16, 2:28 AM, "Pierre-Yves David" <pierre-yves.david at ens-lyon.org> wrote:

    
    
    On 10/11/2016 06:25 PM, Stanislau Hlebik wrote:
    > # HG changeset patch
    > # User Stanislau Hlebik <stash at fb.com>
    > # Date 1476195835 25200
    > #      Tue Oct 11 07:23:55 2016 -0700
    > # Node ID 718ed86a3698631077a087efaf668d70513056f5
    > # Parent  6f5a3300a5457c92eb09170a30c98328ebe3bcce
    > bookmarks: introduce binary encoding
    >
    > Bookmarks binary encoding will be used for `bookmarks` bundle2 part.
    > The format is: <4 bytes - bookmark size, big-endian><bookmark>
    >                <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
    
    CCing greg as our binary format overlord,
    
    I'm not sure how useful it is to have the node to be optional. using 
    nullid should be fine (and most stream/bundle will be compressed anyway.
    
    This would give us the option to have the node+size first (as fixed 
    width) then the bookmark name. Though ?
    
    I do not thing it is necessary to point that the value is big-endian, we 
    use big endian for all binary encoding.
    
    Beside that, the change seems good. I've been tempted to advise for 
    moving the encoding/decoding directly into the bundle2 part to benefit 
    from the bundle2 utility to pack/unpack but it eventually felt more at 
    home here.
    
    I've some small comment inline for your consideration.
    
    > BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is
    > incorrect.
    >
    > diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py
    > --- a/mercurial/bookmarks.py
    > +++ b/mercurial/bookmarks.py
    > @@ -7,8 +7,10 @@
    >
    >  from __future__ import absolute_import
    >
    > +import StringIO
    >  import errno
    >  import os
    > +import struct
    >
    >  from .i18n import _
    >  from .node import (
    > @@ -23,6 +25,77 @@
    >      util,
    >  )
    >
    > +_NONEMPTYNODE = chr(1)
    > +_EMPTYNODE = chr(0)
    
    I would use 'pack' directly here. Who know what awe-full thing around 
    char and str python might be preparing for pytnon3.
    
    > +
    > +def _packbookmarksize(size):
    > +    return struct.pack('>i', size)
    
    Any reason we don't just inline this?
    
    > +
    > +def _unpackbookmarksize(stream):
    > +    """Returns 0 if stream is empty.
    > +    """
    > +
    > +    expectedsize = struct.calcsize('>i')
    > +    encodedbookmarksize = stream.read(expectedsize)
    > +    if len(encodedbookmarksize) == 0:
    > +        return 0
    > +    if len(encodedbookmarksize) != expectedsize:
    > +        raise error.BookmarksDecodeError(
    > +            _('cannot decode bookmark size: '
    > +              'expected size: %d, actual size: %d') %
    > +            (expectedsize, len(encodedbookmarksize)))
    > +    return struct.unpack('>i', encodedbookmarksize)[0]
    
    small nits, The bundle2 code have a lot of utility around reading and 
    unpacking, it seems like it could be worthwhile to move this unpacking 
    code directly in the part.
    
    > +def encodebookmarks(bookmarks):
    > +    """Encodes bookmark to node mapping to the byte string.
    > +
    > +    Format: <4 bytes - bookmark size, big-endian><bookmark>
    > +            <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
    > +    Node may be 20 byte binary string, 40 byte hex string or empty
    
    The part about 'Hex' is surprising is it still up to date.
    
    > +    parts = []
    > +    for bookmark, node in bookmarks.iteritems():
    > +        encodedbookmarksize = _packbookmarksize(len(bookmark))
    > +        parts.append(encodedbookmarksize)
    > +        bookmark = encoding.fromlocal(bookmark)
    > +        parts.append(bookmark)
    > +        if node:
    > +            if len(node) != 20 and len(node) != 40:
    > +                raise error.BookmarksEncodeError()
    > +            if len(node) == 40:
    > +                node = bin(node)
    > +            parts.append(_NONEMPTYNODE)
    > +            parts.append(node)
    > +        else:
    > +            parts.append(_EMPTYNODE)
    > +    return ''.join(parts)
    
    We should probably have something a bit more subtle than 1 unique byte 
    blob. bundle2 part can be fed an iterator yielding the lines one by one 
    would be a good way to have this generation smoother.
    
    > +
    > +def decodebookmarks(buf):
    > +    """Decodes buffer into bookmark to node mapping.
    > +
    > +    Node is either 20 bytes or empty.
    > +    """
    > +
    > +    stream = StringIO.StringIO(buf)
    > +    bookmarks = {}
    > +    while True:
    > +        bookmarksize = _unpackbookmarksize(stream)
    > +        if not bookmarksize:
    > +            break
    
    Could we use 'while bookmark size' as an iteration value. A bare 'while 
    True:' is a bit scary.
    
    > +        bookmark = stream.read(bookmarksize)
    > +        if len(bookmark) != bookmarksize:
    > +            raise error.BookmarksDecodeError(
    > +                'cannot decode bookmark: expected size: %d, '
    > +                'actual size: %d' % (bookmarksize, len(bookmark)))
    > +        bookmark = encoding.tolocal(bookmark)
    > +        emptynode = stream.read(1)
    > +        node = ''
    > +        if emptynode != _EMPTYNODE:
    > +            node = stream.read(20)
    > +        bookmarks[bookmark] = node
    > +    return bookmarks
    > +
    >  def _getbkfile(repo):
    >      """Hook so that extensions that mess with the store can hook bm storage.
    >
    > _______________________________________________
    > Mercurial-devel mailing list
    > Mercurial-devel at mercurial-scm.org
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DQICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=1DxDEGgOTUUOmuCXmZ95OkWoyOXz2D7ILnHuyeqI0iQ&s=-N8JIBF1kxrGeXpr19jfoLFYJFS2Oj2MSZEFpUyjhOE&e= 
    >
    
    -- 
    Pierre-Yves David
    



More information about the Mercurial-devel mailing list