[PATCH 01 of 14 V3] util: move 'readexactly' in the util module

Tue Jan 30 06:41:00 EST 2018

On Mon, 2018-01-29 at 11:37 -0800, Gregory Szorc wrote:
> On Fri, Jan 26, 2018 at 8:39 AM, Boris Feld <boris.feld at octobus.net>
> wrote:
> > On Sat, 2018-01-20 at 12:22 -0800, Gregory Szorc wrote:
> > > On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld at octobus.n
> > > et> wrote:
> > > > # HG changeset patch
> > > > 
> > > > # User Boris Feld <boris.feld at octobus.net>
> > > > 
> > > > # Date 1516391495 -3600
> > > > 
> > > > #      Fri Jan 19 20:51:35 2018 +0100
> > > > 
> > > > # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
> > > > 
> > > > # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
> > > > 
> > > > # EXP-Topic b2-stream
> > > > 
> > > > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > > > 
> > > > #              hg pull https://bitbucket.org/octobus/mercurial-
> > > > devel/ -r 15f7795f96a5
> > > > 
> > > > util: move 'readexactly' in the util module
> > > > 
> > > 
> > > I was swamped the last few days and didn't have time to look at
> > > this series even though I really wanted to. I reviewed it this
> > > morning and am generally happy with it. However, I'd like to make
> > > some BC changes to the wire protocol so we don't ship things in
> > > the wire protocol that we'll need to support indefinitely. A
> > > number of other changes to make this a more useful feature can be
> > > done post RC, during freeze.
> > > 
> > > *High priority issues* (should be addressed before RC)
> > > 
> > > HTTP server is sending bundle2 response with compression. That's
> > > a significant reason why this format is slower than the old
> > > format. I also think bundle2 overhead is coming into play on the
> > > server. I need to profile once compression is removed from the
> > > equation. I'm pretty sure this change can be done during freeze
> > > since it requires the server to drop compression for this bundle2
> > > response. But I'm not 100% about that given how compression is
> > > negotiated between client and server. Would prefer this change
> > > make it in before RC.
> > > 
> > > The new "stream" bundle2 part has a "version" parameter. I don't
> > > like versioning bundle2 parts. I think the behavior of bundle2
> > > parts should be consistent over time. If we need to introduce a
> > > new version of something or change processing semantics, we
> > > should be introducing a new bundle2 part and associate
> > > new/different behavior with the part. Attempting content
> > > negotiation within a named bundle2 part is extra complication and
> > > will lead to increased protocol chaos down the road. This should
> > > be changed before RC since it impacts the wire protocol.
> > > 
> > > Requirements string in bundle2 part should ideally use the same
> > > pre-encoded normalization as bundlespecs for consistency with the
> > > rest of Mercurial. Since it touches the wire protocol, this needs
> > > to change before RC.
> > > 
> > > *Medium priority issues* (ideally addressed before RC)
> > > 
> > > I don't think there's an easy way to generate bundle2 bundles
> > > with 
> > > stream parts. We need this to support clonebundles. This will
> > > likely 
> > > require bundlespec changes. I would highly prefer to get this in
> > > before RC so we can do end-to-end integration testing with
> > > clonebundles.
> > 
> > What about starting by adding a --v2 flag to the debug command
> > generating stream bundle?
> 
> That /could/ work.
> 
> The better path forward is to shoehorn all of this into "bundle
> specifications" (bundlespecs). The user-facing bundlespec string
> defines a set of capabilities and contents of a bundle that any two
> supporting implementations will be able to read and write. Right now,
> v1-* means changegroup 1 (the original bundle format) and v2-* means
> a subset of bundle2. With the addition of phases, bookmarks, etc, it
> is time to create a v3-* that allows the creation of these bundle2
> parts. A "streaming clone bundle" could be a variation of v3-*. e.g.
> v3-none?stream=1.

I've started working on the stream clone bundle v2, you can find the
first iterations of the changesets here: https://bitbucket.org/octobus/
mercurial-devel/commits/all?search=48e60e649700%3A%3A
I start adding a v2 flag to generate a new stream bundle in: https://bi
tbucket.org/octobus/mercurial-
devel/commits/48e60e649700d51a7ae3b221c689ebbd2ec84371
I later introduced a new bundle spec for the v2 stream clone bundle
`none-packed2` to mimic the stream clone bundle v1 bundle spec in https
://bitbucket.org/octobus/mercurial-
devel/commits/4414466e4393e496c1cb4f066322b185a9da7bbf. One thing that
confused me is that v1 and v2 bundle spec seemd to imply that there was
a changegroup part in the bundle, but we don't need to have it in the
bundle, right?
I'm not sure to see the implications of introducing a v3 bundle spec,
what pieces of code needs to change to handle it?
>  
> > > *Lesser priority issues* (can be fixed post RC)
> > > 
> > > streamclone._emit() should be renamed to _emitv2() for
> > > consistency with other functions in file.
> > 
> > (done)
> > 
> > 
> > > I think streamclone.applybundlev2() should handle writing
> > > requirements, not the bundle2 part handler. Similar logic is
> > > already in streamclone.maybeperformlegacystreamclone().
> > 
> > Good point, done.
> > 
> > 
> > > I need to look at this in more detail, but 133a678673cb ("allow
> > > bundle'2 stream clone") is a bit suspect. It's correctness may
> > > depend on your interpretation of whether server.disablefullbundle
> > > applies only to changegroup bundles. I can't recall what existing
> > > behavior is with regards to stream clone. Need to double check.
> > 
> > 133a678673cb exist because a test failed otherwise. Since there is
> > a test for it we assumed it should work. So if we revisit this
> > logic, we should do it for both stream v1 and stream v2.
> > 
> > > test-clone-uncompressed.t wants a comment about legacy stream
> > > clone phases being buggy. We should also decide what we want to
> > > do about issue #5648 since bundle2 stream clone solves the
> > > problem. I'm tempted to mark #5648 as resolved once bundle2
> > > stream clone is enabled by default.
> > 
> > +1 for marking it as solved once the protocol is no longer
> > experimental.
> > 
> > 
> > > In 56c30b31afbe:
> > > @@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
> > >          kwargs['cg'] = False
> > >          kwargs['stream'] = True
> > >          pullop.stepsdone.add('changegroup')
> > > +        pullop.stepsdone.add('phases')
> > > 
> > > Do we know for sure that phases were processed? It depends on
> > > what the server sent, no? I think this code may assume the
> > > behavior of the official Mercurial server and may not be
> > > compatible with other implementation.
> > 
> > We are not sure what you mean here. If the repository is publishing
> > → we don't send any phases, making all transferred changeset
> > public. If server is non-publishing, we transfer the same phase
> > root as the original. This bypasses the usual transfer for phases
> > the same way we bypass the usual transfer for changeset.
> 
> What I mean is that this code is assuming that the exchanged bundle2
> payload either communicated phases or didn't need to communicate
> phases. In other words, this assumes behavior of the canonical
> Mercurial client and server. A separate server implementation may not
> behave similarly. e.g. it may not implement exchange of phases via
> bundle2.
>  
> > > The use of temporary files for non-append-only files and the
> > > context manager and copy primitive associated with that code
> > > feels overly complex. Can we just slurp file content into memory?
> > > We already store the filenames and offsets in memory. We already
> > > scale memory use linearly with number of files in the repo. I
> > > don't see a huge problem doing the same for files that are
> > > proportional to number of changesets, branches, tags, and heads.
> > > That being said, the code is already landed. Some maybe we just
> > > keep it around. If we do keep it around, I think it could use
> > > some refactoring to improve readability. The amount of functions
> > > being called is a bit much for what the code is doing, IMO.
> > 
> > We cannot keep cache file in memory, we've seen extensions where
> > cache file can become huge.
> 
> Bleh. It would be nice to avoid the overhead in the common case where
> the cache files are small. Or, we can put the temp files next to the
> final cache files so we can perform an atomic rename within the same
> directory.
>  
> > > The code around managing multiple vfs instances feels a bit
> > > complex. I don't think we need a dict of vfs instances. We only
> > > have 2. Let's inline their usage and do away with all the context
> > > manager and dictionary lookup complications.
> > > 
> > > Related to vfs, we're not using the "expectedcount" argument with
> > > vfs.backgroundclose(). Ideally, the part header defines the
> > > number of files in each vfs so we can pass the proper value. But,
> > > I think it's fine to use the total expected file count as the
> > > hint to the store vfs and don't use background close for the
> > > cache vfs (since it will almost certainly never have enough
> > > entries to warrant background closing).
> > 
> > The number of filecount is used in debug message and at the
> > protocol level. So in all cases, we'll have to introduce a new
> > parameter to carry the data per vfs. We'll do that next cycle.
> > 
> > > I'm pretty happy with the feature. It is currently behind an
> > > experimental config flag. I think we can make it enabled by
> > > default in 4.5 if we fix the more important issues.
> > 
> > You mean turning it on for 4.5? We find it too aggressive, let's
> > ship it experimental in 4.5 and turn that on early in the 4.6
> > cycle.
> 
> At this point, it is definitely too late to be enabled for 4.5.
>  
> > > *Other observations *
> > > 
> > > The new format seems to use less CPU on the client than the old
> > > format when doing HTTP exchange. 30-31s versus 25-26s for
> > > mozilla-unified. And that's with zstd in play. I haven't
> > > profiled, but this is likely due to not using readline() and
> > > transferring caches and phases data (I believe we avoid a linear
> > > changelog scan post transfer now since I believe something before
> > > needed to reconstruct the branch cache).
> > > 
> > > Here are some example numbers on my i7-6700K.
> > > 
> > > legacy http
> > > server:
> > > 8.44user 3.51system 0:51.21elapsed 23%CPU (0avgtext+0avgdata
> > > 171184maxresident)k
> > > 1144inputs+0outputs (1major+73642minor)pagefaults 0swaps
> > > 
> > > client:
> > > 30.79user 8.47system 0:42.47elapsed 92%CPU (0avgtext+0avgdata
> > > 197244maxresident)k
> > > 19288inputs+7039984outputs (2major+94838minor)pagefaults 0swaps
> > > 
> > > v2 http
> > > server:
> > > 37.69user 8.01system 1:56.61elapsed 39%CPU (0avgtext+0avgdata
> > > 236808maxresident)k
> > > 6836840inputs+21288outputs (2major+82612minor)pagefaults 0swaps
> > > 
> > > client:
> > > 25.82user 8.32system 1:47.83elapsed 31%CPU (0avgtext+0avgdata
> > > 209164maxresident)k
> > > 60848inputs+7061216outputs (2major+78863minor)pagefaults 0swaps
> > >  
> > > > 
> > > > This function is used in multiple place, having it in util
> > > > would be better.
> > > > 
> > > > (existing caller will be migrated in another series)
> > > > 
> > > > 
> > > > 
> > > > diff --git a/mercurial/changegroup.py
> > > > b/mercurial/changegroup.py
> > > > 
> > > > --- a/mercurial/changegroup.py
> > > > 
> > > > +++ b/mercurial/changegroup.py
> > > > 
> > > > @@ -32,14 +32,7 @@ from . import (
> > > > 
> > > >  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
> > > > 
> > > >  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
> > > > 
> > > > 
> > > > 
> > > > -def readexactly(stream, n):
> > > > 
> > > > -    '''read n bytes from stream.read and abort if less was
> > > > available'''
> > > > 
> > > > -    s = stream.read(n)
> > > > 
> > > > -    if len(s) < n:
> > > > 
> > > > -        raise error.Abort(_("stream ended unexpectedly"
> > > > 
> > > > -                           " (got %d bytes, expected %d)")
> > > > 
> > > > -                          % (len(s), n))
> > > > 
> > > > -    return s
> > > > 
> > > > +readexactly = util.readexactly
> > > > 
> > > > 
> > > > 
> > > >  def getchunk(stream):
> > > > 
> > > >      """return the next chunk from stream as a string"""
> > > > 
> > > > diff --git a/mercurial/util.py b/mercurial/util.py
> > > > 
> > > > --- a/mercurial/util.py
> > > > 
> > > > +++ b/mercurial/util.py
> > > > 
> > > > @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx, others=None):
> > > > 
> > > >          fn = '%s~%s~%s' % (f, tag, n)
> > > > 
> > > >          if fn not in ctx and fn not in others:
> > > > 
> > > >              return fn
> > > > 
> > > > +
> > > > 
> > > > +def readexactly(stream, n):
> > > > 
> > > > +    '''read n bytes from stream.read and abort if less was
> > > > available'''
> > > > 
> > > > +    s = stream.read(n)
> > > > 
> > > > +    if len(s) < n:
> > > > 
> > > > +        raise error.Abort(_("stream ended unexpectedly"
> > > > 
> > > > +                           " (got %d bytes, expected %d)")
> > > > 
> > > > +                          % (len(s), n))
> > > > 
> > > > +    return s
> > > > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180130/8dd5395f/attachment.html>