[PATCH 1 of 9 changegroup-apis] exchange: use early return

Gregory Szorc gregory.szorc at gmail.com
Mon Aug 1 22:50:12 EDT 2016


On Mon, Aug 1, 2016 at 6:44 PM, Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 08/01/2016 08:20 PM, Gregory Szorc wrote:
> > This is the first half of the series. The second half can be found at
> > https://hg.mozilla.org/users/gszorc_mozilla.com/hg/rev/26eeb1e66e88 if
> > someone wants to review the whole thing.
>
> I've made a first review pass on this series. As much as I like the idea
> of cleaning this up, the proposed implementation makes me uncomfortable.
> The main points that troubles me are:
>
> * the extra new class is a bit of a mix of (1) outgoing object (2) cg1
> bundler object. This makes a the boundary of these object more fuzzy


Not sure what you mean here. An outgoing object is basically a fancy tuple
holding the set of common and missing nodes. I'm not sure what a "cg1
bundler" is. A non-bundle2 is just a v1 changegroup with a 6 byte header.

If you are referring to returning a cg<N>unpacker instance, I'm not a huge
fan of that. I only did that for parity with existing consumers because I
didn't want to scope bloat to create a 40+ patch series. I do intend to do
some follow-up work here because our handling of changegroup generation is
just bonkers. For example, sending a bundle 2 to the wire protocol with the
current code involves:

1. Construct a cg<N>packer and call generate() to obtain a generator of
changegroup chunks
2. Store a reference to the generator on a bundle2 part
3. Convert generator to a util.chunkbuffer (a file-like object) as part of
bundlepart._payloadchunks() when serializing the bundle2 part
4. Emit a generator of 4k chunks because that's what the
bundlepart.getchunks() does
5. Feed this generator of chunks into a util.chunkbuffer holding the entire
bundle2 data (in exchange.getbundle())
6. Construct a zlib compressor
7. read(4096) from the util.chunkbuffer into the zlib compressor
8. Emit a generator of zlib compressed chunks to be sent to WSGI

util.chunkbuffer overhead has appeared in profiling results when cloning
Firefox. The thing is that none of these util.chunkbuffer are really
necessary for the wire protocol case: we could feed a generator of chunks
of various sizes into zlib then emit whatever chunks it spits out, possibly
normalizing them to equal sizes if we really wanted.

I think the end state of this series involves a "changegroup emitter" being
fed into some kind of "bundler," preferably as a generator. Using the
cg1unpacker for this purpose is kind of silly, as it necessitates a
util.chunkbuffer. The good news is the "changegroupemitter" as it stands
can easily be represented as both a generator of chunks or a file handle.
So by introducing this object and requiring callers to use it, we can start
transitioning callers to not use util.chunkbuffer without having to change
APIs on changegroup.py. It's much more flexible.



> and I not convinced yet your cleanup could not happen with the existing
> objects,
>

The existing functions are horrible and not very useful. As this series
demonstrates, we had something like 6 functions producing either:

* a generator of changegroup chunks
* a cg<N>unpacker instance

from

* an outgoing instance
* a set of bases and heads
* a set common nodes and heads

sometimes with knobs to control the changegroup version.

The API was unclear and far too low-level (e.g. sometimes you wanted to
access the number of changesets or the nodes in the changegroup afterwards
and the API hid these from you). To be honest, it wasn't clear to me that
this was how things worked until I wrote the series. That's not a good sign.

The new API solves makes thing simpler by allowing you to construct a
stateful object to representing a generated changegroup and then get a
changegroup representation various ways. It is much more explicit.


> * many @class method usage while our coding style usually avoid them
> along other "class as namespace" usage (as per Matt taste),
>

That's fine. These can be pulled out into module-level functions. e.g.
emitterfromrootsandheads(repo, roots, heads) -> changegroupemitter


>
> * the
>
> superlongnameforeverything.thatcombinewithclassnamspacing.foroverninethousand
> charline we could probably me more compact using a abbreviation for
> common parts as we do elsewhere in the code.
>

I agree the names are a bit long. But I'm not a fan of abbreviations
because they make readability hard, especially for those who don't speak
English natively.

I think remaining "changegroupemitter" to "emitter" (because "changegroup"
is redundant with the module name) and moving the @classmethods to module
functions will make things much better. e.g.

emitter = changegroup.changegroupemitter.fromheadsandcommon(repo, heads, common)
emitter = changegroup.emitterfromheadsandcommon(repo, heads, common)

Or if wanted to write a complex __init__ function, we could have it accept
all arguments. But I don't like complexity and prefer separate
"constructor" functions.


> Let me do a full scan of the second part of the series tomorrow and then
> I either understand the purity of your way, or find some super precise
> feed back, or we might discuss it over voice.
>
> > Word of warning: if you pull that repo, you'll get tons of obsolescence
> > markers. You may want to disable obsolescence when pulling.
>
> These are legit markers that do not exist in the repository because we
> use email and we somewhat managed to not pull from you in the past. We
> should be okay pushing them to the main server (the marker/changeset
> ratio is a bit high but fine).
>
> Cheers,
>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20160801/950ba1af/attachment.html>


More information about the Mercurial-devel mailing list