[PATCH 01 of 14 V3] util: move 'readexactly' in the util module

Boris Feld boris.feld at octobus.net
Fri Jan 26 11:39:38 EST 2018


On Sat, 2018-01-20 at 12:22 -0800, Gregory Szorc wrote:
> On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld at octobus.net>
> wrote:
> > # HG changeset patch
> > 
> > # User Boris Feld <boris.feld at octobus.net>
> > 
> > # Date 1516391495 -3600
> > 
> > #      Fri Jan 19 20:51:35 2018 +0100
> > 
> > # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
> > 
> > # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
> > 
> > # EXP-Topic b2-stream
> > 
> > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > 
> > #              hg pull https://bitbucket.org/octobus/mercurial-deve
> > l/ -r 15f7795f96a5
> > 
> > util: move 'readexactly' in the util module
> > 
> 
> I was swamped the last few days and didn't have time to look at this
> series even though I really wanted to. I reviewed it this morning and
> am generally happy with it. However, I'd like to make some BC changes
> to the wire protocol so we don't ship things in the wire protocol
> that we'll need to support indefinitely. A number of other changes to
> make this a more useful feature can be done post RC, during freeze.
> 
> *High priority issues* (should be addressed before RC)
> 
> HTTP server is sending bundle2 response with compression. That's a
> significant reason why this format is slower than the old format. I
> also think bundle2 overhead is coming into play on the server. I need
> to profile once compression is removed from the equation. I'm pretty
> sure this change can be done during freeze since it requires the
> server to drop compression for this bundle2 response. But I'm not
> 100% about that given how compression is negotiated between client
> and server. Would prefer this change make it in before RC.
> 
> The new "stream" bundle2 part has a "version" parameter. I don't like
> versioning bundle2 parts. I think the behavior of bundle2 parts
> should be consistent over time. If we need to introduce a new version
> of something or change processing semantics, we should be introducing
> a new bundle2 part and associate new/different behavior with the
> part. Attempting content negotiation within a named bundle2 part is
> extra complication and will lead to increased protocol chaos down the
> road. This should be changed before RC since it impacts the wire
> protocol.
> 
> Requirements string in bundle2 part should ideally use the same pre-
> encoded normalization as bundlespecs for consistency with the rest of
> Mercurial. Since it touches the wire protocol, this needs to change
> before RC.
> 
> *Medium priority issues* (ideally addressed before RC)
> 
> I don't think there's an easy way to generate bundle2 bundles with 
> stream parts. We need this to support clonebundles. This will likely 
> require bundlespec changes. I would highly prefer to get this in
> before RC so we can do end-to-end integration testing with
> clonebundles.

What about starting by adding a --v2 flag to the debug command
generating stream bundle?
> *Lesser priority issues* (can be fixed post RC)
> 
> streamclone._emit() should be renamed to _emitv2() for consistency
> with other functions in file.

(done)
> I think streamclone.applybundlev2() should handle writing
> requirements, not the bundle2 part handler. Similar logic is already
> in streamclone.maybeperformlegacystreamclone().

Good point, done.
> I need to look at this in more detail, but 133a678673cb ("allow
> bundle'2 stream clone") is a bit suspect. It's correctness may depend
> on your interpretation of whether server.disablefullbundle applies
> only to changegroup bundles. I can't recall what existing behavior is
> with regards to stream clone. Need to double check.

133a678673cb exist because a test failed otherwise. Since there is
a test for it we assumed it should work. So if we revisit this logic,
we should do it for both stream v1 and stream v2.
> test-clone-uncompressed.t wants a comment about legacy stream clone
> phases being buggy. We should also decide what we want to do about
> issue #5648 since bundle2 stream clone solves the problem. I'm
> tempted to mark #5648 as resolved once bundle2 stream clone is
> enabled by default.

+1 for marking it as solved once the protocol is no longer
experimental.
> In 56c30b31afbe:
> @@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
>          kwargs['cg'] = False
>          kwargs['stream'] = True
>          pullop.stepsdone.add('changegroup')
> +        pullop.stepsdone.add('phases')
> 
> Do we know for sure that phases were processed? It depends on what
> the server sent, no? I think this code may assume the behavior of the
> official Mercurial server and may not be compatible with other
> implementation.

We are not sure what you mean here. If the repository is publishing →
we don't send any phases, making all transferred changeset public. If
server is non-publishing, we transfer the same phase root as the
original. This bypasses the usual transfer for phases the same way we
bypass the usual transfer for changeset.
> The use of temporary files for non-append-only files and the context
> manager and copy primitive associated with that code feels overly
> complex. Can we just slurp file content into memory? We already store
> the filenames and offsets in memory. We already scale memory use
> linearly with number of files in the repo. I don't see a huge problem
> doing the same for files that are proportional to number of
> changesets, branches, tags, and heads. That being said, the code is
> already landed. Some maybe we just keep it around. If we do keep it
> around, I think it could use some refactoring to improve readability.
> The amount of functions being called is a bit much for what the code
> is doing, IMO.

We cannot keep cache file in memory, we've seen extensions where
cache file can become huge.
> The code around managing multiple vfs instances feels a bit complex.
> I don't think we need a dict of vfs instances. We only have 2. Let's
> inline their usage and do away with all the context manager and
> dictionary lookup complications.
> 
> Related to vfs, we're not using the "expectedcount" argument with
> vfs.backgroundclose(). Ideally, the part header defines the number of
> files in each vfs so we can pass the proper value. But, I think it's
> fine to use the total expected file count as the hint to the store
> vfs and don't use background close for the cache vfs (since it will
> almost certainly never have enough entries to warrant background
> closing).

The number of filecount is used in debug message and at the protocol
level. So in all cases, we'll have to introduce a new parameter to
carry the data per vfs. We'll do that next cycle.
> I'm pretty happy with the feature. It is currently behind an
> experimental config flag. I think we can make it enabled by default
> in 4.5 if we fix the more important issues.

You mean turning it on for 4.5? We find it too aggressive, let's ship
it experimental in 4.5 and turn that on early in the 4.6 cycle.
> *Other observations *
> 
> The new format seems to use less CPU on the client than the old
> format when doing HTTP exchange. 30-31s versus 25-26s for mozilla-
> unified. And that's with zstd in play. I haven't profiled, but this
> is likely due to not using readline() and transferring caches and
> phases data (I believe we avoid a linear changelog scan post transfer
> now since I believe something before needed to reconstruct the branch
> cache).
> 
> Here are some example numbers on my i7-6700K.
> 
> legacy http
> server:
> 8.44user 3.51system 0:51.21elapsed 23%CPU (0avgtext+0avgdata
> 171184maxresident)k
> 1144inputs+0outputs (1major+73642minor)pagefaults 0swaps
> 
> client:
> 30.79user 8.47system 0:42.47elapsed 92%CPU (0avgtext+0avgdata
> 197244maxresident)k
> 19288inputs+7039984outputs (2major+94838minor)pagefaults 0swaps
> 
> v2 http
> server:
> 37.69user 8.01system 1:56.61elapsed 39%CPU (0avgtext+0avgdata
> 236808maxresident)k
> 6836840inputs+21288outputs (2major+82612minor)pagefaults 0swaps
> 
> client:
> 25.82user 8.32system 1:47.83elapsed 31%CPU (0avgtext+0avgdata
> 209164maxresident)k
> 60848inputs+7061216outputs (2major+78863minor)pagefaults 0swaps
>  
> > 
> > This function is used in multiple place, having it in util would be
> > better.
> > 
> > (existing caller will be migrated in another series)
> > 
> > 
> > 
> > diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> > 
> > --- a/mercurial/changegroup.py
> > 
> > +++ b/mercurial/changegroup.py
> > 
> > @@ -32,14 +32,7 @@ from . import (
> > 
> >  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
> > 
> >  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
> > 
> > 
> > 
> > -def readexactly(stream, n):
> > 
> > -    '''read n bytes from stream.read and abort if less was
> > available'''
> > 
> > -    s = stream.read(n)
> > 
> > -    if len(s) < n:
> > 
> > -        raise error.Abort(_("stream ended unexpectedly"
> > 
> > -                           " (got %d bytes, expected %d)")
> > 
> > -                          % (len(s), n))
> > 
> > -    return s
> > 
> > +readexactly = util.readexactly
> > 
> > 
> > 
> >  def getchunk(stream):
> > 
> >      """return the next chunk from stream as a string"""
> > 
> > diff --git a/mercurial/util.py b/mercurial/util.py
> > 
> > --- a/mercurial/util.py
> > 
> > +++ b/mercurial/util.py
> > 
> > @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx, others=None):
> > 
> >          fn = '%s~%s~%s' % (f, tag, n)
> > 
> >          if fn not in ctx and fn not in others:
> > 
> >              return fn
> > 
> > +
> > 
> > +def readexactly(stream, n):
> > 
> > +    '''read n bytes from stream.read and abort if less was
> > available'''
> > 
> > +    s = stream.read(n)
> > 
> > +    if len(s) < n:
> > 
> > +        raise error.Abort(_("stream ended unexpectedly"
> > 
> > +                           " (got %d bytes, expected %d)")
> > 
> > +                          % (len(s), n))
> > 
> > +    return s
> > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180126/1198a6d3/attachment.html>


More information about the Mercurial-devel mailing list