[PATCH 4 of 4 RFC stream clone bundles] commands.unbundle: support consuming streaming clone bundles

Thu Oct 15 12:46:07 CDT 2015

On Thu, Oct 15, 2015 at 12:42 AM, Adrian Buehlmann <adrian at cadifra.com>
wrote:

> On 2015-10-15 01:40, Gregory Szorc wrote:
> > On Wed, Oct 14, 2015 at 4:31 PM, Adrian Buehlmann <adrian at cadifra.com
> > <mailto:adrian at cadifra.com>> wrote:
> >
> >     On 2015-10-15 00:34, Gregory Szorc wrote:
> >     [..]
> >     > +We can unpack packed1 bundles
> >     > +
> >     > +  $ hg init packed
> >     > +  $ hg -R packed unbundle packed.hg
> >     > +  6 files to transfer, 2.55 KB of data
> >     > +  transferred 2.55 KB in *.* seconds (*/sec) (glob)
> >     > +  (run 'hg heads' to see heads, 'hg merge' to merge)
> >     > +
> >
> >     Potentially stupid idea:
> >
> >     Why not implement a special form of the clone command, which reads
> from
> >     such a full "bundle" file and creates the repo from that bundle?
> >
> >        $ hg clone packed.hg packed
> >
> >
> > You can already do this! Although, it doesn't work with this new bundle
> > type because you need to teach bundlerepo.py about different bundle
> > types (it's also broken for bundle2 currently).
>
> This sounds a bit like a contradiction to me... ("You can already do
> this" vs "Although, it doesn't work...")
>

Sorry. You can `hg clone` from a bundle file. However, that only works for
HG10 bundle types (changegroup 1 data) today: the command will crash if you
attempt to clone from a bundle2 file. It will also crash with stream
clones. bundlerepo.py needs some love to make it work with different bundle
types.

>
> If it doesn't work, then it should probably be fixed to work with these
> kind of bundles?
>
> These kind of bundles can't be applied to a non-empty repo, so it seems
> to me that trying to make the unbundle command work with these is a bit
> of a misfit.
>

I'm sympathetic to this argument.

>
> The error message in the case below just looked a bit silly to me.. The
> next question then was: Why add support for these kind of "bundles" to
> the unbundle command, if they can only be used on empty repos? If you
> start doing that (adding support for these kind of bundles to the
> unbundle command), you will have to support it forever...
>
> >     I don't know if - alternatively - some new option would be needed.
> Maybe
> >     like this:
> >
> >        $ hg clone --unbundle packed.hg packed
> >
> >     > +We can't unpack packed1 bundles on non-empty repos
> >     > +
> >     > +  $ hg -R packed unbundle packed.hg
> >     > +  abort: cannot apply stream clone bundle on non-empty repo
> >     > +  [255]
> >     > +
> >
> >     ..which then could not happen any more
> >
> >
> > This touches on the subject of whether streaming clone / packed bundles
> > are or aren't bundles. I could argue both perspectives. But if they
> > aren't bundles, then we need to invent new commands and figure out a way
> > to shoehorn them into the new clone bundles feature. I think I'm fine
> > with either way: I just care most about getting streaming clone support
> > into the clone bundles feature otherwise the feature is useless for
> > Mozilla's automation needs.
>
> As I understand the matter, these new "full (streamclone) bundles"
> aren't really bundles to me. They can never be used like a classic
> bundle. What can you do with such a full bundle? The only command that
> makes sense is clone: Transform it into a new repo.
>
> But perhaps I'm missing your point here, as I'm mostly sitting on the
> fence. I got (unwillingly) involved into the business of streamcloning
> because streamcloning ('hg clone --uncompressed') was the preexisting
> feature which made the fncache file (.hg/store/fncache) necessary. Which
> in turn was used as the name of the new repo format that I helped
> introduce quite a while a ago. The only purpose of that repo format was
> to limit the length of path names under .hg/store, which was biting us
> on Windows. I then got bitten again when Bryan and myself tortured each
> other with translating "my" horrible store._hybridencode function into C
> code, which he found to be needed, because it turned out to be too slow
> for his important use case. Yikes.
>
> Now I see you inventing a new kind of bundles out of this streamclone
> stuff. Which is quite interesting.
>

Stream cloning to me is all about raw clone speed: the client can
effectively stream bits from the network to the filesystem. No relatively
expensive changegroup reading, delta application, zlib decompression, etc:
just the equivalent of `curl | tar`.

Mozilla wrote an extension that can download a pre-built stream clone file
and apply it. We're hosting these files in S3 and in EC2 we can clone at
50+ MB/s. The result is clone times that are 3-5x faster than traditional,
changegroup-based clones.

My end goal is to introduce a "packed revlog" data format that is
essentially an index + raw revlog data. This is *very* similar to streaming
clones bundles of today except there is a file index at the beginning so
you can quickly find revlog content for a specific file. I would like to
implement a "packed revlog repo" class that can read repositories directly
from this single file. When this is implemented, a streaming clone would
write a single, large "packed revlog" to disk (as opposed to 1 file per
revlog). You would be able to clone as fast as your network and filesystem
allowed. (Writing 100,000 files is slow, no matter how small.) You would be
able to clone a 1 GB repo in 10s over a 1 Gbps network.
https://hg.mozilla.org/users/gszorc_mozilla.com/hg/rev/d6aeb3b8b512
contains the "stream clone v2" part of that work.

My immediate goal is to support stream clones over the new clone bundles
feature so core has feature parity with what Mozilla implemented. Honestly,
I'd be fine with not supporting unbundling of stream clone "bundles"
through any other means. As long as a server admin can produce them and
clone bundles can consume them, I'm happy. Considering the limitations of
these "bundles" with `hg unbundle`, perhaps it makes sense to go that
route...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151015/1f435e2a/attachment.html>