[PATCH 2 of 2] bundle2: add a part to import external bundles

Mike Hommey mh at glandium.org
Sat Oct 11 17:27:28 CDT 2014


On Sat, Oct 11, 2014 at 11:46:18AM -0700, Gregory Szorc wrote:
> On 10/11/2014 1:38 AM, Mike Hommey wrote:
> > # HG changeset patch
> > # User Mike Hommey <mh at glandium.org>
> > # Date 1413016657 -32400
> > #      Sat Oct 11 17:37:37 2014 +0900
> > # Node ID 538073e6c7f6ae680cbdd9bcb44c214fd6653eb1
> > # Parent  225306662bbf7c8cc6d867d484eab1e038f6c2a5
> > bundle2: add a part to import external bundles
> 
> Great patch. I can't wait to start utilizing this!
> 
> Comments inline.
> 
> > diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> > --- a/mercurial/bundle2.py
> > +++ b/mercurial/bundle2.py
> > + at parthandler('b2x:import-bundle')
> > +def handleimportbundle(op, inpart):
> > +    """apply a bundle on the repo, given an url and validation information
> > +
> > +    The part contains a bundle url, as well as hash digest(s) and size
> > +    information to validate the downloaded content.
> > +    The expected format for this part is:
> > +        <URL>
> > +        <SIZE in bytes>
> > +        <DIGEST-TYPE>: <DIGEST-VALUE>
> > +        ...
> > +    Multiple digest types are supported in the same part, in which case they
> > +    are all validated.
> > +    This currently only supports bundle10, but is intended to support bundle20
> > +    eventually.
> > +    """
> 
> Instead of an order-dependent format, how do you feel about the HTTP
> header style "K: V" format for all metadata? Or is the bundle2 style of
> expanding functionality to introduce a new part with similar but
> enhanced functionality?

I'm not sure there's much functionality to add to this particular part.
If you want some extra functionality, you're probably better covered
with a new part type.
 
> Do you think we should make room for the host fingerprint here so
> clients following URLs won't see "unknown fingerprint" warnings?
> There is conceptually a security risk here, but I think if a client
> trusts the remote that is handing out this data, it can temporarily
> trust fingerprints via proxy of trust.

What do you mean about "unknown fingerprint" warnings? Are you talking
about possible ssl certificates problems on the remote pointed at for
external bundles? Would you want the import-bundle part to possibly
contain certificate material so that the client could identify
self-signed certificates? How useful would that be in practice?
Note that the reason there are digests for the remote bundle is that
they may be served by foreign servers, which trust level is uncertain.
The client is already trusting the mercurial server with changegroups
it sends directly, so the same level of trust can be given to those
digests, but it would be unsafe to blindly trust the foreign server
giving the right bundles.
(Although, the digests are not mandatory, someone can very well
implement import-bundle parts without them, if they don't care)

> > diff --git a/tests/test-bundle2-import.t b/tests/test-bundle2-import.t
> > new file mode 100644
> > --- /dev/null
> > +++ b/tests/test-bundle2-import.t
> 
> > +Start a simple HTTP server to serve bundles (stolen from test-static-http.t)
> > +
> > +  $ cat > dumb.py <<EOF
> > +  > import BaseHTTPServer, SimpleHTTPServer, os, signal, sys
> > +  > 
> > +  > def run(server_class=BaseHTTPServer.HTTPServer,
> > +  >         handler_class=SimpleHTTPServer.SimpleHTTPRequestHandler):
> > +  >     server_address = ('localhost', int(os.environ['HGPORT']))
> > +  >     httpd = server_class(server_address, handler_class)
> > +  >     httpd.serve_forever()
> > +  > 
> > +  > signal.signal(signal.SIGTERM, lambda x, y: sys.exit(0))
> > +  > fp = file('dumb.pid', 'wb')
> > +  > fp.write(str(os.getpid()) + '\n')
> > +  > fp.close()
> > +  > run()
> > +  > EOF
> > +  $ python dumb.py 2>/dev/null &
> > +
> > +Cannot just read $!, it will not be set to the right value on Windows/MinGW
> > +
> > +  $ cat > wait.py <<EOF
> > +  > import time
> > +  > while True:
> > +  >     try:
> > +  >         if '\n' in file('dumb.pid', 'rb').read():
> > +  >             break
> > +  >     except IOError:
> > +  >         pass
> > +  >     time.sleep(0.2)
> > +  > EOF
> > +  $ python wait.py
> > +  $ cat dumb.pid >> $DAEMON_PIDS
> 
> Why can't dumb.py append to the os.environ['DAEMON_PIDS'] file directly?

I just copied that whole thing from tests/test-static-http.t

> > +Test a clone with a single import-bundle.
> > +
> > +  $ hg bundle -R repo -a bundle.hg
> > +  8 changesets found
> > +  $ cat > repo/.hg/bundle2maker << EOF
> > +  > import-bundle http://localhost:$HGPORT/bundle.hg bundle.hg
> > +  > EOF
> > +  $ hg clone ssh://user@dummy/repo clone
> > +  requesting all changes
> > +  adding changesets
> > +  adding manifests
> > +  adding file changes
> > +  added 8 changesets with 7 changes to 7 files (+2 heads)
> > +  updating to branch default
> > +  3 files updated, 0 files merged, 0 files removed, 0 files unresolved
> 
> This test doesn't actually verify import-bundle is used. We need some
> kind of UI output or confirmation from the HTTP/bundle server to verify
> the client is actually fetching a remote URL/bundle. This deficiency is
> present in subsequent tests.
> 
> I'm not sure if we want the client to print import-bundle details by
> default or if it should be hidden behind --debug. Large repositories
> presumably generating many bundles could lead to console spam, which is
> a point in --debug's favor. More on this later.
> 
> > +Test a clone with an import-bundle and a following changegroup
> > +
> > +  $ hg bundle -R repo --base 000000000000 -r '0:4' bundle2.hg
> > +  5 changesets found
> > +  $ cat > repo/.hg/bundle2maker << EOF
> > +  > import-bundle http://localhost:$HGPORT/bundle2.hg bundle2.hg
> > +  > changegroup 0:4 5:7
> > +  > EOF
> > +  $ hg clone ssh://user@dummy/repo clone
> > +  requesting all changes
> > +  adding changesets
> > +  adding manifests
> > +  adding file changes
> > +  added 5 changesets with 5 changes to 5 files (+1 heads)
> > +  adding changesets
> > +  adding manifests
> > +  adding file changes
> > +  added 3 changesets with 2 changes to 2 files (+1 heads)
> > +  updating to branch default
> > +  3 files updated, 0 files merged, 0 files removed, 0 files unresolved
> 
> Pretending I'm an average user, this output is kinda verbose and
> confusing. Maybe we *do* need some kind of normal output saying
> "fetching changes from URL" to make this clearer. Still, I can't help
> but think of the Firefox use case where clones are fetching 20+ bundles
> (of 10k changesets each).

Considering the time it takes to clone Firefox, and how many bundles
might be involved, it might make sense to show their url as some kind of
progress meter that show that things are _not_ looping out of control.

> > +Corruption tests
> > +
> > +  $ hg clone orig clone -r 2
> > +  adding changesets
> > +  adding manifests
> > +  adding file changes
> > +  added 3 changesets with 3 changes to 3 files
> > +  updating to branch default
> > +  3 files updated, 0 files merged, 0 files removed, 0 files unresolved
> > +
> > +  $ cat > repo/.hg/bundle2maker << EOF
> > +  > import-bundle http://localhost:$HGPORT/bundle7.hg bundle7.hg
> > +  > import-bundle http://localhost:$HGPORT/bundle8.hg bundle7.hg
> > +  > changegroup 0:6 7
> > +  > EOF
> > +  $ hg pull -R clone ssh://user@dummy/repo
> > +  pulling from ssh://user@dummy/repo
> > +  searching for changes
> > +  adding changesets
> > +  adding manifests
> > +  adding file changes
> > +  added 2 changesets with 2 changes to 2 files (+1 heads)
> > +  adding changesets
> > +  adding manifests
> > +  adding file changes
> > +  added 2 changesets with 1 changes to 1 files
> > +  transaction abort!
> > +  rollback completed
> > +  abort: bundle at http://localhost:$HGPORT/bundle8.hg is corrupted
> > +  [255]
> 
> From an end-user perspective, it's not entirely clear what the state of
> the local store is after this abort. Was the rollback full or partial?

Full. There will be a possibility of partial roolbacks in the future,
with checkpoints (which, in fact, if we could get them in 3.2, that
would be awesome)

> Is there any way we could sneak a partial pull summary message in here?
> (You'll probably want to do this as a separate patch. And knowing the
> transaction code, this could be "fun" to implement.)

I think all the UX part can be a separate patch, possibly even landing
well after the functionality itself (since bundle2 is still going to be
experimental)

Mike


More information about the Mercurial-devel mailing list