[PATCH 01 of 14 V3] util: move 'readexactly' in the util module

Boris Feld boris.feld at octobus.net
Wed Jan 31 11:57:04 EST 2018


On Tue, 2018-01-30 at 17:02 -0500, Augie Fackler wrote:
> > On Jan 30, 2018, at 17:00, Boris Feld <boris.feld at octobus.net>
> > wrote:
> > After chatting on IRC with Gregory about this, here is a detailed
> > list of proposals:
> > 
> > - The `X-v2` bundlespec should be stable in time. An old Mercurial
> > client shouldn't choke on a bundle 2 if there is a new unknown
> > part. Adding the phases and obsolescence parts using experimental
> > config knobs violate this guarantee.
> > => Proposal: Bundle announced with `X-v2` bundlespec will always be
> > bundle 2 without phases, bookmarks or obsolescence parts. `hg
> > bundle --type=v2` would generate such bundles.
> > 
> > - With the recent addition of new bundle 2 parts, we should
> > introduce a new bundle-spec which will imply that the bundle
> > contains these recent parts.
> > => Proposal: Introduce a new bundlespec format `X-v3`. It will be
> > for bundle 2 containing a changegroup part in addition to a
> > bookmarks, phase, and obsolescence parts. It will be generated with
> > `hg bundle --type=v3`.
> > => Proposal: As an edge-case, people may want to generate a v3
> > bundle without a specific part. A variant could be introduced, like
> > `zstd-v3;no-bookmarks`.
> > 
> > - The old stream clone bundle use the `none-packed1` bundlespec.
> > Instead of using a dedicated version, we should use the same
> > bundle2 versions; v2 and v3. Also, it uses a dedicated debug
> > command, we should instead use the bundle command which is the
> > best-suited to handle that task.
> > => Proposal: Introduce a new bundlespec format `none-
> > v2;stream=v2;requirements..` for the stream clone v2. This will
> > guarantee a stream part but the bundle will not contain the
> > bookmarks, phase and obsolescence parts. It will be generated with
> > `hg bundle --type=none-v2;stream=v2`.
> > => Proposal: Introduce a new bundlespec format `none-
> > v3;stream=v2;requirements..` for the stream clone v2. This will
> > guarantee a stream part in addition to a bookmarks, phase, and
> > obsolescence parts. It will be generated with `hg bundle --
> > type=none-v3;stream=v2`.
> > 
> > Carrying these guarantees in the bundlespec will likely require
> > some changes in the `parsebundlespec` API as it would likely need
> > to return the `contentops` based on the bundlespec instead of
> > computing the contentops in the bundle command.
> > 
> > Refactoring the bundlespec parsing for the `none-v2;stream=v2;`
> > could be done before agreeing on the level of guarantees the `X-v3` 
> > bundlespec gives, so a possible two-steps plan is:
> > 
> > - Refactor and add the possibility to generate stream v2 clone
> > bundle with the `none-v2;stream=v2;` bundlespec and the bundle
> > command.
> > - Introduce the `X-v3` bundlespecs.
> > 
> > I think we should be able to target the 4.5 release for the first
> > step, I can have a series ready for tomorrow.
> 
> Are we really proposing significant feature work for the release that
> happens in 48 hours? After the RC?
> 
> I can't say I'm enthused, but I could be convinced if Greg is on
> board.
I can understand your frustration, we tried to process each of the
comments made by Gregory on the original series. His feedback has been
precious to improve the original series, and we tried to address or
answer each comment as soon as possible. The bundlespec refactoring
surely slowed the process a bit but it is a good improvement to make
the clone bundle feature less hidden and more mainstream.
I sent the series on phabricator: https://phab.mercurial-scm.org/D1949.
It contains the fix for the last two comments Gregory made on the
stream v2 bundle series:
- Format the requirements parameter in the same way as the bundlespec
for the stream v1 bundle. It's fixed by https://phab.mercurial-scm.org/
D1949 and https://phab.mercurial-scm.org/D1950. We missed it in our
previous wave of fixes. As Gregory said, it touches the wire protocol,
so it needs to be fixed before the release.
- Introduce the new bundlespec for the stream v2. It follows the
proposed design by Gregory. The rest of the series is 7 changesets but
is very small, it's 120 lines of code mostly from the refactoring of
the contentopts. Without this feature, we cannot list a stream v2
bundle as clone bundle in a way that old clients could safely ignore
it.
If the refactoring of bundle spec is too big to be included so late in
the release process, we could start by using a dummy bundle spec like
`none-packed2;requirements...`, the code required for the change would
be smaller but that would be more hackish and would need to be cleaned
by the bundle spec refactoring.
> > What do you think?
> > On Tue, 2018-01-30 at 08:35 -0800, Gregory Szorc wrote:
> > > On Tue, Jan 30, 2018 at 3:41 AM, Boris Feld <boris.feld at octobus.n
> > > et> wrote:
> > > > On Mon, 2018-01-29 at 11:37 -0800, Gregory Szorc wrote:
> > > > > On Fri, Jan 26, 2018 at 8:39 AM, Boris Feld <boris.feld at octob
> > > > > us.net> wrote:
> > > > > > On Sat, 2018-01-20 at 12:22 -0800, Gregory Szorc wrote:
> > > > > > > On Fri, Jan 19, 2018 at 3:47 PM, Boris Feld <boris.feld at o
> > > > > > > ctobus.net> wrote:
> > > > > > > > # HG changeset patch
> > > > > > > > 
> > > > > > > > # User Boris Feld <boris.feld at octobus.net>
> > > > > > > > 
> > > > > > > > # Date 1516391495 -3600
> > > > > > > > 
> > > > > > > > #      Fri Jan 19 20:51:35 2018 +0100
> > > > > > > > 
> > > > > > > > # Node ID 15f7795f96a5f9acb3ed2e640fcec82f3ccd6f53
> > > > > > > > 
> > > > > > > > # Parent  de32acb24949c0e3633de373d1c6c8c814faa804
> > > > > > > > 
> > > > > > > > # EXP-Topic b2-stream
> > > > > > > > 
> > > > > > > > # Available At https://bitbucket.org/octobus/mercurial-
> > > > > > > > devel/
> > > > > > > > 
> > > > > > > > #              hg pull https://bitbucket.org/octobus/me
> > > > > > > > rcurial-devel/ -r 15f7795f96a5
> > > > > > > > 
> > > > > > > > util: move 'readexactly' in the util module
> > > > > > > > 
> > > > > > > 
> > > > > > > I was swamped the last few days and didn't have time to
> > > > > > > look at this series even though I really wanted to. I
> > > > > > > reviewed it this morning and am generally happy with it.
> > > > > > > However, I'd like to make some BC changes to the wire
> > > > > > > protocol so we don't ship things in the wire protocol
> > > > > > > that we'll need to support indefinitely. A number of
> > > > > > > other changes to make this a more useful feature can be
> > > > > > > done post RC, during freeze.
> > > > > > > 
> > > > > > > *High priority issues* (should be addressed before RC)
> > > > > > > 
> > > > > > > HTTP server is sending bundle2 response with compression.
> > > > > > > That's a significant reason why this format is slower
> > > > > > > than the old format. I also think bundle2 overhead is
> > > > > > > coming into play on the server. I need to profile once
> > > > > > > compression is removed from the equation. I'm pretty sure
> > > > > > > this change can be done during freeze since it requires
> > > > > > > the server to drop compression for this bundle2 response.
> > > > > > > But I'm not 100% about that given how compression is
> > > > > > > negotiated between client and server. Would prefer this
> > > > > > > change make it in before RC.
> > > > > > > 
> > > > > > > The new "stream" bundle2 part has a "version" parameter.
> > > > > > > I don't like versioning bundle2 parts. I think the
> > > > > > > behavior of bundle2 parts should be consistent over time.
> > > > > > > If we need to introduce a new version of something or
> > > > > > > change processing semantics, we should be introducing a
> > > > > > > new bundle2 part and associate new/different behavior
> > > > > > > with the part. Attempting content negotiation within a
> > > > > > > named bundle2 part is extra complication and will lead to
> > > > > > > increased protocol chaos down the road. This should be
> > > > > > > changed before RC since it impacts the wire protocol.
> > > > > > > 
> > > > > > > Requirements string in bundle2 part should ideally use
> > > > > > > the same pre-encoded normalization as bundlespecs for
> > > > > > > consistency with the rest of Mercurial. Since it touches
> > > > > > > the wire protocol, this needs to change before RC.
> > > > > > > 
> > > > > > > *Medium priority issues* (ideally addressed before RC)
> > > > > > > 
> > > > > > > I don't think there's an easy way to generate bundle2
> > > > > > > bundles with 
> > > > > > > stream parts. We need this to support clonebundles. This
> > > > > > > will likely 
> > > > > > > require bundlespec changes. I would highly prefer to get
> > > > > > > this in before RC so we can do end-to-end integration
> > > > > > > testing with clonebundles.
> > > > > > 
> > > > > > What about starting by adding a --v2 flag to the debug
> > > > > > command generating stream bundle?
> > > > > 
> > > > > That /could/ work.
> > > > > 
> > > > > The better path forward is to shoehorn all of this into
> > > > > "bundle specifications" (bundlespecs). The user-facing
> > > > > bundlespec string defines a set of capabilities and contents
> > > > > of a bundle that any two supporting implementations will be
> > > > > able to read and write. Right now, v1-* means changegroup 1
> > > > > (the original bundle format) and v2-* means a subset of
> > > > > bundle2. With the addition of phases, bookmarks, etc, it is
> > > > > time to create a v3-* that allows the creation of these
> > > > > bundle2 parts. A "streaming clone bundle" could be a
> > > > > variation of v3-*. e.g. v3-none?stream=1.
> > > > 
> > > > I've started working on the stream clone bundle v2, you can
> > > > find the first iterations of the changesets here: https://bitbu
> > > > cket.org/octobus/mercurial-
> > > > devel/commits/all?search=48e60e649700%3A%3A
> > > > 
> > > > I start adding a v2 flag to generate a new stream bundle in: ht
> > > > tps://bitbucket.org/octobus/mercurial-
> > > > devel/commits/48e60e649700d51a7ae3b221c689ebbd2ec84371
> > > > 
> > > > I later introduced a new bundle spec for the v2 stream clone
> > > > bundle `none-packed2` to mimic the stream clone bundle v1
> > > > bundle spec in https://bitbucket.org/octobus/mercurial-devel/co
> > > > mmits/4414466e4393e496c1cb4f066322b185a9da7bbf. One thing that
> > > > confused me is that v1 and v2 bundle spec seemd to imply that
> > > > there was a changegroup part in the bundle, but we don't need
> > > > to have it in the bundle, right?
> > > > 
> > > > I'm not sure to see the implications of introducing a v3 bundle
> > > > spec, what pieces of code needs to change to handle it?
> > > 
> > > A v3 bundlespec will require various changes in exchange.py.
> > > commands.py will need updates to `hg bundle`. Clonebundles code
> > > needs to be taught about it.
> > > But what is probably needed first is an overhaul of the code for
> > > conveying bundle "capabilities." IMO we want a mechanism to
> > > convert a bundlespec to set of bundle2 "client" capabilities.
> > > Then we can feed those into bundle generation functions. Local
> > > bundle2 generation would then look very similar to how bundles
> > > are generated for the wire protocol. It would be a good thing for
> > > those to share code paths.
> > >  
> > > > >  
> > > > > > > *Lesser priority issues* (can be fixed post RC)
> > > > > > > 
> > > > > > > streamclone._emit() should be renamed to _emitv2() for
> > > > > > > consistency with other functions in file.
> > > > > > 
> > > > > > (done)
> > > > > > 
> > > > > > 
> > > > > > > I think streamclone.applybundlev2() should handle writing
> > > > > > > requirements, not the bundle2 part handler. Similar logic
> > > > > > > is already in
> > > > > > > streamclone.maybeperformlegacystreamclone().
> > > > > > 
> > > > > > Good point, done.
> > > > > > 
> > > > > > 
> > > > > > > I need to look at this in more detail, but 133a678673cb
> > > > > > > ("allow bundle'2 stream clone") is a bit suspect. It's
> > > > > > > correctness may depend on your interpretation of whether
> > > > > > > server.disablefullbundle applies only to changegroup
> > > > > > > bundles. I can't recall what existing behavior is with
> > > > > > > regards to stream clone. Need to double check.
> > > > > > 
> > > > > > 133a678673cb exist because a test failed otherwise. Since
> > > > > > there is a test for it we assumed it should work. So if we
> > > > > > revisit this logic, we should do it for both stream v1 and
> > > > > > stream v2.
> > > > > > 
> > > > > > > test-clone-uncompressed.t wants a comment about legacy
> > > > > > > stream clone phases being buggy. We should also decide
> > > > > > > what we want to do about issue #5648 since bundle2 stream
> > > > > > > clone solves the problem. I'm tempted to mark #5648 as
> > > > > > > resolved once bundle2 stream clone is enabled by default.
> > > > > > 
> > > > > > +1 for marking it as solved once the protocol is no longer
> > > > > > experimental.
> > > > > > 
> > > > > > 
> > > > > > > In 56c30b31afbe:
> > > > > > > @@ -1465,6 +1465,7 @@ def _pullbundle2(pullop):
> > > > > > >          kwargs['cg'] = False
> > > > > > >          kwargs['stream'] = True
> > > > > > >          pullop.stepsdone.add('changegroup')
> > > > > > > +        pullop.stepsdone.add('phases')
> > > > > > > 
> > > > > > > Do we know for sure that phases were processed? It
> > > > > > > depends on what the server sent, no? I think this code
> > > > > > > may assume the behavior of the official Mercurial server
> > > > > > > and may not be compatible with other implementation.
> > > > > > 
> > > > > > We are not sure what you mean here. If the repository is
> > > > > > publishing → we don't send any phases, making all
> > > > > > transferred changeset public. If server is non-publishing,
> > > > > > we transfer the same phase root as the original. This
> > > > > > bypasses the usual transfer for phases the same way we
> > > > > > bypass the usual transfer for changeset.
> > > > > 
> > > > > What I mean is that this code is assuming that the exchanged
> > > > > bundle2 payload either communicated phases or didn't need to
> > > > > communicate phases. In other words, this assumes behavior of
> > > > > the canonical Mercurial client and server. A separate server
> > > > > implementation may not behave similarly. e.g. it may not
> > > > > implement exchange of phases via bundle2.
> > > > >  
> > > > > > > The use of temporary files for non-append-only files and
> > > > > > > the context manager and copy primitive associated with
> > > > > > > that code feels overly complex. Can we just slurp file
> > > > > > > content into memory? We already store the filenames and
> > > > > > > offsets in memory. We already scale memory use linearly
> > > > > > > with number of files in the repo. I don't see a huge
> > > > > > > problem doing the same for files that are proportional to
> > > > > > > number of changesets, branches, tags, and heads. That
> > > > > > > being said, the code is already landed. Some maybe we
> > > > > > > just keep it around. If we do keep it around, I think it
> > > > > > > could use some refactoring to improve readability. The
> > > > > > > amount of functions being called is a bit much for what
> > > > > > > the code is doing, IMO.
> > > > > > 
> > > > > > We cannot keep cache file in memory, we've seen extensions
> > > > > > where cache file can become huge.
> > > > > 
> > > > > Bleh. It would be nice to avoid the overhead in the common
> > > > > case where the cache files are small. Or, we can put the temp
> > > > > files next to the final cache files so we can perform an
> > > > > atomic rename within the same directory.
> > > > >  
> > > > > > > The code around managing multiple vfs instances feels a
> > > > > > > bit complex. I don't think we need a dict of vfs
> > > > > > > instances. We only have 2. Let's inline their usage and
> > > > > > > do away with all the context manager and dictionary
> > > > > > > lookup complications.
> > > > > > > 
> > > > > > > Related to vfs, we're not using the "expectedcount"
> > > > > > > argument with vfs.backgroundclose(). Ideally, the part
> > > > > > > header defines the number of files in each vfs so we can
> > > > > > > pass the proper value. But, I think it's fine to use the
> > > > > > > total expected file count as the hint to the store vfs
> > > > > > > and don't use background close for the cache vfs (since
> > > > > > > it will almost certainly never have enough entries to
> > > > > > > warrant background closing).
> > > > > > 
> > > > > > The number of filecount is used in debug message and at the
> > > > > > protocol level. So in all cases, we'll have to introduce a
> > > > > > new parameter to carry the data per vfs. We'll do that next
> > > > > > cycle.
> > > > > > 
> > > > > > > I'm pretty happy with the feature. It is currently behind
> > > > > > > an experimental config flag. I think we can make it
> > > > > > > enabled by default in 4.5 if we fix the more important
> > > > > > > issues.
> > > > > > 
> > > > > > You mean turning it on for 4.5? We find it too aggressive,
> > > > > > let's ship it experimental in 4.5 and turn that on early in
> > > > > > the 4.6 cycle.
> > > > > 
> > > > > At this point, it is definitely too late to be enabled for
> > > > > 4.5.
> > > > >  
> > > > > > > *Other observations *
> > > > > > > 
> > > > > > > The new format seems to use less CPU on the client than
> > > > > > > the old format when doing HTTP exchange. 30-31s versus
> > > > > > > 25-26s for mozilla-unified. And that's with zstd in play.
> > > > > > > I haven't profiled, but this is likely due to not using
> > > > > > > readline() and transferring caches and phases data (I
> > > > > > > believe we avoid a linear changelog scan post transfer
> > > > > > > now since I believe something before needed to
> > > > > > > reconstruct the branch cache).
> > > > > > > 
> > > > > > > Here are some example numbers on my i7-6700K.
> > > > > > > 
> > > > > > > legacy http
> > > > > > > server:
> > > > > > > 8.44user 3.51system 0:51.21elapsed 23%CPU
> > > > > > > (0avgtext+0avgdata 171184maxresident)k
> > > > > > > 1144inputs+0outputs (1major+73642minor)pagefaults 0swaps
> > > > > > > 
> > > > > > > client:
> > > > > > > 30.79user 8.47system 0:42.47elapsed 92%CPU
> > > > > > > (0avgtext+0avgdata 197244maxresident)k
> > > > > > > 19288inputs+7039984outputs (2major+94838minor)pagefaults
> > > > > > > 0swaps
> > > > > > > 
> > > > > > > v2 http
> > > > > > > server:
> > > > > > > 37.69user 8.01system 1:56.61elapsed 39%CPU
> > > > > > > (0avgtext+0avgdata 236808maxresident)k
> > > > > > > 6836840inputs+21288outputs (2major+82612minor)pagefaults
> > > > > > > 0swaps
> > > > > > > 
> > > > > > > client:
> > > > > > > 25.82user 8.32system 1:47.83elapsed 31%CPU
> > > > > > > (0avgtext+0avgdata 209164maxresident)k
> > > > > > > 60848inputs+7061216outputs (2major+78863minor)pagefaults
> > > > > > > 0swaps
> > > > > > >  
> > > > > > > > 
> > > > > > > > This function is used in multiple place, having it in
> > > > > > > > util would be better.
> > > > > > > > 
> > > > > > > > (existing caller will be migrated in another series)
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > diff --git a/mercurial/changegroup.py
> > > > > > > > b/mercurial/changegroup.py
> > > > > > > > 
> > > > > > > > --- a/mercurial/changegroup.py
> > > > > > > > 
> > > > > > > > +++ b/mercurial/changegroup.py
> > > > > > > > 
> > > > > > > > @@ -32,14 +32,7 @@ from . import (
> > > > > > > > 
> > > > > > > >  _CHANGEGROUPV2_DELTA_HEADER = "20s20s20s20s20s"
> > > > > > > > 
> > > > > > > >  _CHANGEGROUPV3_DELTA_HEADER = ">20s20s20s20s20sH"
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -def readexactly(stream, n):
> > > > > > > > 
> > > > > > > > -    '''read n bytes from stream.read and abort if less
> > > > > > > > was available'''
> > > > > > > > 
> > > > > > > > -    s = stream.read(n)
> > > > > > > > 
> > > > > > > > -    if len(s) < n:
> > > > > > > > 
> > > > > > > > -        raise error.Abort(_("stream ended
> > > > > > > > unexpectedly"
> > > > > > > > 
> > > > > > > > -                           " (got %d bytes, expected
> > > > > > > > %d)")
> > > > > > > > 
> > > > > > > > -                          % (len(s), n))
> > > > > > > > 
> > > > > > > > -    return s
> > > > > > > > 
> > > > > > > > +readexactly = util.readexactly
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > >  def getchunk(stream):
> > > > > > > > 
> > > > > > > >      """return the next chunk from stream as a
> > > > > > > > string"""
> > > > > > > > 
> > > > > > > > diff --git a/mercurial/util.py b/mercurial/util.py
> > > > > > > > 
> > > > > > > > --- a/mercurial/util.py
> > > > > > > > 
> > > > > > > > +++ b/mercurial/util.py
> > > > > > > > 
> > > > > > > > @@ -3865,3 +3865,12 @@ def safename(f, tag, ctx,
> > > > > > > > others=None):
> > > > > > > > 
> > > > > > > >          fn = '%s~%s~%s' % (f, tag, n)
> > > > > > > > 
> > > > > > > >          if fn not in ctx and fn not in others:
> > > > > > > > 
> > > > > > > >              return fn
> > > > > > > > 
> > > > > > > > +
> > > > > > > > 
> > > > > > > > +def readexactly(stream, n):
> > > > > > > > 
> > > > > > > > +    '''read n bytes from stream.read and abort if less
> > > > > > > > was available'''
> > > > > > > > 
> > > > > > > > +    s = stream.read(n)
> > > > > > > > 
> > > > > > > > +    if len(s) < n:
> > > > > > > > 
> > > > > > > > +        raise error.Abort(_("stream ended
> > > > > > > > unexpectedly"
> > > > > > > > 
> > > > > > > > +                           " (got %d bytes, expected
> > > > > > > > %d)")
> > > > > > > > 
> > > > > > > > +                          % (len(s), n))
> > > > > > > > 
> > > > > > > > +    return s
> > > > > > > > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20180131/fb94a846/attachment.html>


More information about the Mercurial-devel mailing list