[PATCH 3 of 3 RFC cg3] changegroup: introduce cg3, which has support for exchanging treemanifests

Fri Dec 4 12:23:17 CST 2015

On Thu, Dec 3, 2015 at 4:04 PM Augie Fackler <raf at durin42.com> wrote:

> (responses inline, tried to prune a fair amount to keep it easier to
> follow. I’ll hold off on a v2 until my questions below are resolved. Thanks
> for the review!)
>
> > On Dec 3, 2015, at 12:42 PM, Martin von Zweigbergk <
> martinvonz at google.com> wrote:
> >
> >
> >
> > On Thu, Dec 3, 2015 at 7:58 AM Augie Fackler <raf at durin42.com> wrote:
> >> # HG changeset patch
> >> # User Augie Fackler <augie at google.com>
> >> # Date 1449092255 21600
> >> #      Wed Dec 02 15:37:35 2015 -0600
> >> # Node ID 1240273d0971ff1d5e596ffea69cf34a07335d92
> >> # Parent  79ad570337631038a08b75957b2890975221e089
>
> […]
>
> >> @@ -603,6 +612,11 @@ class cg1packer(object):
> >>              size += len(chunk)
> >>              yield chunk
> >>          self._verbosenote(_('%8.i (manifests)\n') % size)
> >> +        # It looks odd to assert this here, but tmfnodes doesn't get
> >> +        # filled in until after we've called lookuplinknode for
> >> +        # sending root manifests, so the only way to tell the streams
> >> +        # got crossed is to check after we've done all the work.
> >> +        assert not tmfnodes
> >>
> > I think it would be clearer to override the entire generate() method and
> add call the cg2packer.generate() from there if not using tree manifests
> and then dedicate the rest of the method to tree manifests. It would mean
> more code duplication, but some of that could be extracted to helpers. I
> tried it and I think it removes a lot of confusion.
>
> I get where you’re coming from, but generate is a place where we’re likely
> to do performance work in the future, and I’d like to keep it to one
> (slightly more complicated) implementation, rather than two that we have to
> remember to update in parallel.
>

>
> >>
> >> @@ -676,10 +699,26 @@ class cg1packer(object):
> >>                          fclnode = fclnodes.setdefault(n, clnode)
> >>                          if clrevorder[clnode] < clrevorder[fclnode]:
> >>                              fclnodes[n] = clnode
> >> +                        # gather list of changed treemanifest nodes
> >> +                        if '/' in f and ('treemanifest' in
> >> +                                         self._repo.requirements):
> >> +                            dirs = f.split('/')[:-1]
> >> +                            submf = mdata
> >> +                            path = []
> >> +                            for sd in dirs:
> >> +                                submf = submf._dirs[sd + '/']
> >
> > Using the "private" members in treemanifest directly is a little scary.
> The class takes care to load the manifest in all public methods, but
> accessing _dirs directly does not do that. I think it works here because
> the readfast() method called into readdelta(), which called into
> _slowreaddelta(), which populates the manifest with a diff. However, if the
> delta parent is not a parent, readfast() would not call readdelta(), it
> instead calls read(), which returns an uninitialized (lazily loaded)
> manifest. Besides the problem of it not being loaded (I *think*; we'd have
> to verify), it would be bad for narrowhg, which does not want the entire
> manifest iterated over.
>
> I agree, I’d rather not use the private method, but do I have a choice?
> How should I ensure the submf is loaded?
>
> (I think the submf sure to already be fully loaded because immediately
> before we do the directory checks we looked at a file along the same path,
> so I’m not sure we need any extra paranoia today. Should I add a comment to
> that effect?)
>

Yes, you're right, the iteration does mean that they are loaded. And I was
wrong about it calling readdelta(): it actually calls read(), and if it had
called readdelta(), it would fail on submf.node(). So if you keep the call
to readfast() for now, you should replace the .node() by ._node. I think we
should include more than one changeset in the clone test to make sure
readfast() results in both read() and readdelta().

>
> >
> > I think cg3packer.generate() should *always* call readdelta() for
> treemanifests (until we implement the fast path).
>
> I don’t follow this at all. :(
>

It might not have made sense. My goal was to avoid mf.read().iteritems()
since that is bad for narrowhg (and narrowhg is the main reason for tree
manifests). Anyway, we talked about this on VC, and it can be improved
later.

>
>
> [snip]
>
> > +            if 'treemanifest' not in repo.requirements:
> > +                if not wasempty:
> > +                    raise error.Abort(_(
> > +                        "bundle contains tree manifests, but local repo
> is "
> > +                        "non-empty and does not use tree manifests"))
> > +                repo.requirements.add('treemanifest')
> > +                repo._applyopenerreqs()
> > +                repo._writerequirements()
> > +                repo.manifest._treeondisk = True
> > +                repo.manifest._treeinmem = True
> >
> > This would not be necessary if the requirement had been added to the
> repo before the manifest revlog was accessed. How do we handle this
> elsewhere? Is this check what we would want the client to run into:
> https://selenic.com/hg/file/7e1fac6c0a9c/mercurial/exchange.py#l1103
>
> It’s weird. Basically, clone creates a repo on-disk and then does a pull
> into it, so it doesn’t turn on tree manifest as a requirement because it
> doesn’t know until it’s too late. Does that make sense?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151204/c2e43a15/attachment.html>