[PATCH 2 of 2] bundle2: add a part to import external bundles

Gregory Szorc gregory.szorc at gmail.com
Sat Oct 11 13:46:18 CDT 2014


On 10/11/2014 1:38 AM, Mike Hommey wrote:
> # HG changeset patch
> # User Mike Hommey <mh at glandium.org>
> # Date 1413016657 -32400
> #      Sat Oct 11 17:37:37 2014 +0900
> # Node ID 538073e6c7f6ae680cbdd9bcb44c214fd6653eb1
> # Parent  225306662bbf7c8cc6d867d484eab1e038f6c2a5
> bundle2: add a part to import external bundles

Great patch. I can't wait to start utilizing this!

Comments inline.

> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> --- a/mercurial/bundle2.py
> +++ b/mercurial/bundle2.py
> + at parthandler('b2x:import-bundle')
> +def handleimportbundle(op, inpart):
> +    """apply a bundle on the repo, given an url and validation information
> +
> +    The part contains a bundle url, as well as hash digest(s) and size
> +    information to validate the downloaded content.
> +    The expected format for this part is:
> +        <URL>
> +        <SIZE in bytes>
> +        <DIGEST-TYPE>: <DIGEST-VALUE>
> +        ...
> +    Multiple digest types are supported in the same part, in which case they
> +    are all validated.
> +    This currently only supports bundle10, but is intended to support bundle20
> +    eventually.
> +    """

Instead of an order-dependent format, how do you feel about the HTTP
header style "K: V" format for all metadata? Or is the bundle2 style of
expanding functionality to introduce a new part with similar but
enhanced functionality?

Do you think we should make room for the host fingerprint here so
clients following URLs won't see "unknown fingerprint" warnings? There
is conceptually a security risk here, but I think if a client trusts the
remote that is handing out this data, it can temporarily trust
fingerprints via proxy of trust.

> diff --git a/tests/test-bundle2-import.t b/tests/test-bundle2-import.t
> new file mode 100644
> --- /dev/null
> +++ b/tests/test-bundle2-import.t

> +Start a simple HTTP server to serve bundles (stolen from test-static-http.t)
> +
> +  $ cat > dumb.py <<EOF
> +  > import BaseHTTPServer, SimpleHTTPServer, os, signal, sys
> +  > 
> +  > def run(server_class=BaseHTTPServer.HTTPServer,
> +  >         handler_class=SimpleHTTPServer.SimpleHTTPRequestHandler):
> +  >     server_address = ('localhost', int(os.environ['HGPORT']))
> +  >     httpd = server_class(server_address, handler_class)
> +  >     httpd.serve_forever()
> +  > 
> +  > signal.signal(signal.SIGTERM, lambda x, y: sys.exit(0))
> +  > fp = file('dumb.pid', 'wb')
> +  > fp.write(str(os.getpid()) + '\n')
> +  > fp.close()
> +  > run()
> +  > EOF
> +  $ python dumb.py 2>/dev/null &
> +
> +Cannot just read $!, it will not be set to the right value on Windows/MinGW
> +
> +  $ cat > wait.py <<EOF
> +  > import time
> +  > while True:
> +  >     try:
> +  >         if '\n' in file('dumb.pid', 'rb').read():
> +  >             break
> +  >     except IOError:
> +  >         pass
> +  >     time.sleep(0.2)
> +  > EOF
> +  $ python wait.py
> +  $ cat dumb.pid >> $DAEMON_PIDS

Why can't dumb.py append to the os.environ['DAEMON_PIDS'] file directly?

> +Test a clone with a single import-bundle.
> +
> +  $ hg bundle -R repo -a bundle.hg
> +  8 changesets found
> +  $ cat > repo/.hg/bundle2maker << EOF
> +  > import-bundle http://localhost:$HGPORT/bundle.hg bundle.hg
> +  > EOF
> +  $ hg clone ssh://user@dummy/repo clone
> +  requesting all changes
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 8 changesets with 7 changes to 7 files (+2 heads)
> +  updating to branch default
> +  3 files updated, 0 files merged, 0 files removed, 0 files unresolved

This test doesn't actually verify import-bundle is used. We need some
kind of UI output or confirmation from the HTTP/bundle server to verify
the client is actually fetching a remote URL/bundle. This deficiency is
present in subsequent tests.

I'm not sure if we want the client to print import-bundle details by
default or if it should be hidden behind --debug. Large repositories
presumably generating many bundles could lead to console spam, which is
a point in --debug's favor. More on this later.

> +Test a clone with an import-bundle and a following changegroup
> +
> +  $ hg bundle -R repo --base 000000000000 -r '0:4' bundle2.hg
> +  5 changesets found
> +  $ cat > repo/.hg/bundle2maker << EOF
> +  > import-bundle http://localhost:$HGPORT/bundle2.hg bundle2.hg
> +  > changegroup 0:4 5:7
> +  > EOF
> +  $ hg clone ssh://user@dummy/repo clone
> +  requesting all changes
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 5 changesets with 5 changes to 5 files (+1 heads)
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 3 changesets with 2 changes to 2 files (+1 heads)
> +  updating to branch default
> +  3 files updated, 0 files merged, 0 files removed, 0 files unresolved

Pretending I'm an average user, this output is kinda verbose and
confusing. Maybe we *do* need some kind of normal output saying
"fetching changes from URL" to make this clearer. Still, I can't help
but think of the Firefox use case where clones are fetching 20+ bundles
(of 10k changesets each).

> +Corruption tests
> +
> +  $ hg clone orig clone -r 2
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 3 changesets with 3 changes to 3 files
> +  updating to branch default
> +  3 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +
> +  $ cat > repo/.hg/bundle2maker << EOF
> +  > import-bundle http://localhost:$HGPORT/bundle7.hg bundle7.hg
> +  > import-bundle http://localhost:$HGPORT/bundle8.hg bundle7.hg
> +  > changegroup 0:6 7
> +  > EOF
> +  $ hg pull -R clone ssh://user@dummy/repo
> +  pulling from ssh://user@dummy/repo
> +  searching for changes
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 2 changesets with 2 changes to 2 files (+1 heads)
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 2 changesets with 1 changes to 1 files
> +  transaction abort!
> +  rollback completed
> +  abort: bundle at http://localhost:$HGPORT/bundle8.hg is corrupted
> +  [255]

>From an end-user perspective, it's not entirely clear what the state of
the local store is after this abort. Was the rollback full or partial?
Is there any way we could sneak a partial pull summary message in here?
(You'll probably want to do this as a separate patch. And knowing the
transaction code, this could be "fun" to implement.)

> +No content
> +Missing size
> +Size mismatch
> +Garbage data
> +Unknown digest
> +Not an HTTP url

The comprehensive testing of failure cases is quite nice!



More information about the Mercurial-devel mailing list