[PATCH 2 of 2] bundle2: add a part to import external bundles

Sat Oct 11 18:54:09 CDT 2014

On 10/11/2014 10:38 AM, Mike Hommey wrote:
> # HG changeset patch
> # User Mike Hommey <mh at glandium.org>
> # Date 1413016657 -32400
> #      Sat Oct 11 17:37:37 2014 +0900
> # Node ID 538073e6c7f6ae680cbdd9bcb44c214fd6653eb1
> # Parent  225306662bbf7c8cc6d867d484eab1e038f6c2a5
> bundle2: add a part to import external bundles

It took me some time to figure out what this was all about.

What do "a part" mean here? "Functionality"/"support"/"infrastructure"? 
Or more like "multi part"?

It also seems essential that it is client side support only - that 
deserves mentioning in the short summary in the first line.

Perhaps:
bundle2: client side support for bundles containing references to other 
bundles

> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> --- a/mercurial/bundle2.py
> +++ b/mercurial/bundle2.py

> +class digester(object):
> +    """helper to compute multiple digests for the same data."""
> +
> +    DIGESTS = {
> +        'md5': util.md5,

Why care about md5 at all?

Why support multiple digests? That seems like unnecessary complexity. 
The rest of Mercurial might be moving away from sha1 one day, but then 
this should be handled like everything else.

Digests and validation could perhaps be put in a different patch than 
the multi bundle handling itself so we got smaller parts that were 
easier to review.

>   capabilities = {'HG2X': (),
>                   'b2x:listkeys': (),
>                   'b2x:pushkey': (),
>                   'b2x:changegroup': (),
> +                'digests': tuple(digester.DIGESTS.keys()),

As mentioned, I don't see the point in supporting multiple digests, but 
I would expect these references to external bundles requires a new 
capability flag?

> + at parthandler('b2x:import-bundle')
> +def handleimportbundle(op, inpart):
> +    """apply a bundle on the repo, given an url and validation information
> +
> +    The part contains a bundle url, as well as hash digest(s) and size
> +    information to validate the downloaded content.
> +    The expected format for this part is:
> +        <URL>
> +        <SIZE in bytes>
> +        <DIGEST-TYPE>: <DIGEST-VALUE>
> +        ...
> +    Multiple digest types are supported in the same part, in which case they
> +    are all validated.
> +    This currently only supports bundle10, but is intended to support bundle20
> +    eventually.
> +    """
> +    lines = inpart.read().splitlines()
> +    try:
> +        url = lines.pop(0)
> +    except IndexError:
> +        raise util.Abort(_('import-bundle format error'))
> +    parsed_url = urlparse.urlparse(url)
> +    if parsed_url.scheme not in ('http', 'https'):
> +        raise util.Abort(_('import-bundle only supports http/https urls'))
> +
> +    try:
> +        size = int(lines.pop(0))
> +    except (ValueError, IndexError):
> +        raise util.Abort(_('import-bundle format error'))
> +    digests = {}
> +    for line in lines:
> +        if ':' not in line:
> +            raise util.Abort(_('import-bundle format error'))
> +        k, v = line.split(':', 1)
> +        digests[k.strip()] = v.strip()
> +
> +    real_part = digestchecker(urllib2.urlopen(url), size, digests)

How will this work with authentication and server certificate validation 
etc? I would expect it should use (and perhaps extend) the existing URL 
abstraction layers in url.py and elsewhere.

I also wonder why it only supports http(s). I would expect the most 
common case would be to get all bundles from the same location. Other 
protocols can of course be added later, but it seems wrong to have any 
protocol knowledge at all at this level.

Breaking bundles up in multiple parts do of course raise the question of 
whether the reference between them should use "relative" or "absolute" 
"paths". There are pros and cons, but I think I would prefer relative in 
most cases.

> +
> +    # Make sure we trigger a transaction creation
> +    #
> +    # The addchangegroup function will get a transaction object by itself, but
> +    # we need to make sure we trigger the creation of a transaction object used
> +    # for the whole processing scope.
> +    op.gettransaction()
> +    import exchange
> +    cg = exchange.readbundle(op.repo.ui, real_part, None)
> +    if isinstance(cg, unbundle20):
> +        raise util.Abort(_('import-bundle only supports bundle10'))

Why is that? (Might deserve a comment.)

It seems wrong to have an architecture where we don't just automagically 
support all bundle formats.

> +    ret = changegroup.addchangegroup(op.repo, cg, 'bundle2', 'bundle2')
> +    op.records.add('import-bundle', {'return': ret})
> +    if op.reply is not None:
> +        # This is definitly not the final form of this
> +        # return. But one need to start somewhere.
> +        part = op.reply.newpart('b2x:reply:import-bundle')
> +        part.addparam('in-reply-to', str(inpart.id), mandatory=False)
> +        part.addparam('return', '%i' % ret, mandatory=False)
> +    if not real_part.validate():
> +        raise util.Abort(_('bundle at %s is corrupted') % url)
> +    assert not inpart.read()
> +
>   @parthandler('b2x:reply:changegroup', ('return', 'in-reply-to'))
>   def handlereplychangegroup(op, inpart):
>       ret = int(inpart.params['return'])
>       replyto = int(inpart.params['in-reply-to'])
>       op.records.add('changegroup', {'return': ret}, replyto)
>   
>   @parthandler('b2x:check:heads')
>   def handlecheckheads(op, inpart):
>

/Mads