[PATCH 2 of 2] bundle2: add a part to import external bundles

Mike Hommey mh at glandium.org
Sat Oct 11 19:42:04 CDT 2014


On Sun, Oct 12, 2014 at 01:54:09AM +0200, Mads Kiilerich wrote:
> On 10/11/2014 10:38 AM, Mike Hommey wrote:
> ># HG changeset patch
> ># User Mike Hommey <mh at glandium.org>
> ># Date 1413016657 -32400
> >#      Sat Oct 11 17:37:37 2014 +0900
> ># Node ID 538073e6c7f6ae680cbdd9bcb44c214fd6653eb1
> ># Parent  225306662bbf7c8cc6d867d484eab1e038f6c2a5
> >bundle2: add a part to import external bundles
> 
> It took me some time to figure out what this was all about.
> 
> What do "a part" mean here? "Functionality"/"support"/"infrastructure"? Or
> more like "multi part"?

bundle2 contains parts that are treated independently.

> It also seems essential that it is client side support only - that deserves
> mentioning in the short summary in the first line.
> 
> Perhaps:
> bundle2: client side support for bundles containing references to other
> bundles
> 
> >diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> >--- a/mercurial/bundle2.py
> >+++ b/mercurial/bundle2.py
> 
> >+class digester(object):
> >+    """helper to compute multiple digests for the same data."""
> >+
> >+    DIGESTS = {
> >+        'md5': util.md5,
> 
> Why care about md5 at all?
> 
> Why support multiple digests? That seems like unnecessary complexity. The
> rest of Mercurial might be moving away from sha1 one day, but then this
> should be handled like everything else.

Because both md5 and sha1 are already weakened and they are the only
digests we can support as long as mercurial sticks to supporting python
2.4. The bundles don't /need/ to contain all digest types, but having
multiple ones when the only available ones are md5 and sha1 kind of
increases security (it's harder to create collisions for *both* digests
at the same time)

> Digests and validation could perhaps be put in a different patch than the
> multi bundle handling itself so we got smaller parts that were easier to
> review.
> 
> >  capabilities = {'HG2X': (),
> >                  'b2x:listkeys': (),
> >                  'b2x:pushkey': (),
> >                  'b2x:changegroup': (),
> >+                'digests': tuple(digester.DIGESTS.keys()),
> 
> As mentioned, I don't see the point in supporting multiple digests

The point here is that the client advertizes what digests it supports,
for the future case where it might support something else like sha512 or
sha-3. Note we could settle to use hashlib if the python running
mercurial supports it, adding support for other hashes, but that doesn't
have to be in this patch.

> but I would expect these references to external bundles requires a new
> capability flag?

that's an omission I noticed already. cf.
http://www.selenic.com/pipermail/mercurial-devel/2014-October/062507.html

> >+ at parthandler('b2x:import-bundle')
> >+def handleimportbundle(op, inpart):
> >+    """apply a bundle on the repo, given an url and validation information
> >+
> >+    The part contains a bundle url, as well as hash digest(s) and size
> >+    information to validate the downloaded content.
> >+    The expected format for this part is:
> >+        <URL>
> >+        <SIZE in bytes>
> >+        <DIGEST-TYPE>: <DIGEST-VALUE>
> >+        ...
> >+    Multiple digest types are supported in the same part, in which case they
> >+    are all validated.
> >+    This currently only supports bundle10, but is intended to support bundle20
> >+    eventually.
> >+    """
> >+    lines = inpart.read().splitlines()
> >+    try:
> >+        url = lines.pop(0)
> >+    except IndexError:
> >+        raise util.Abort(_('import-bundle format error'))
> >+    parsed_url = urlparse.urlparse(url)
> >+    if parsed_url.scheme not in ('http', 'https'):
> >+        raise util.Abort(_('import-bundle only supports http/https urls'))
> >+
> >+    try:
> >+        size = int(lines.pop(0))
> >+    except (ValueError, IndexError):
> >+        raise util.Abort(_('import-bundle format error'))
> >+    digests = {}
> >+    for line in lines:
> >+        if ':' not in line:
> >+            raise util.Abort(_('import-bundle format error'))
> >+        k, v = line.split(':', 1)
> >+        digests[k.strip()] = v.strip()
> >+
> >+    real_part = digestchecker(urllib2.urlopen(url), size, digests)
> 
> How will this work with authentication and server certificate validation
> etc? I would expect it should use (and perhaps extend) the existing URL
> abstraction layers in url.py and elsewhere.

I thought mercurial's url opener was "installed" as the urllib handler.
Isn't it the case?

> I also wonder why it only supports http(s). I would expect the most common
> case would be to get all bundles from the same location.

You're assuming those references would come in a bundle stored on some
server, when the use-case behind this is that you connect to a mercurial
server for a pull or a clone, and get those references from there. Those
references would *not* point to the same server. Also, at the point
where the bundle data is handled, there is not access to where it came
from (afaik)

> Other protocols can
> of course be added later, but it seems wrong to have any protocol knowledge
> at all at this level.

I'm not sure. The first reason I limited the protocols is that it didn't
seem reasonable for a random request to a mercurial server to tell you
to connect to some ssh server or use a file on your own local file
system.

> Breaking bundles up in multiple parts do of course raise the question of
> whether the reference between them should use "relative" or "absolute"
> "paths". There are pros and cons, but I think I would prefer relative in
> most cases.
> 
> >+
> >+    # Make sure we trigger a transaction creation
> >+    #
> >+    # The addchangegroup function will get a transaction object by itself, but
> >+    # we need to make sure we trigger the creation of a transaction object used
> >+    # for the whole processing scope.
> >+    op.gettransaction()
> >+    import exchange
> >+    cg = exchange.readbundle(op.repo.ui, real_part, None)
> >+    if isinstance(cg, unbundle20):
> >+        raise util.Abort(_('import-bundle only supports bundle10'))
> 
> Why is that? (Might deserve a comment.)

The comment is there in the function description, but it's not "exposed" to the
user.

> It seems wrong to have an architecture where we don't just automagically
> support all bundle formats.

The intent *is* to support all bundle formats. It requires more work to
support bundle2 bundles. Also, bundle2 bundles don't currently exist
outside of push/pull/clone. There is no bundle/unbundle support for
creating and unbundling them yet.

Mike


More information about the Mercurial-devel mailing list