[PATCH 3 of 3 RFC cg3] changegroup: introduce cg3, which has support for exchanging treemanifests

Augie Fackler raf at durin42.com
Thu Dec 3 19:16:53 CST 2015


For the benefit of folks on the list: Martin and I chatted via VC a bit ago, and I expect to mail a v2 soon, probably tomorrow.

AF

> On Dec 3, 2015, at 7:04 PM, Augie Fackler <raf at durin42.com> wrote:
> 
> (responses inline, tried to prune a fair amount to keep it easier to follow. I’ll hold off on a v2 until my questions below are resolved. Thanks for the review!)
> 
>> On Dec 3, 2015, at 12:42 PM, Martin von Zweigbergk <martinvonz at google.com> wrote:
>> 
>> 
>> 
>> On Thu, Dec 3, 2015 at 7:58 AM Augie Fackler <raf at durin42.com> wrote:
>>> # HG changeset patch
>>> # User Augie Fackler <augie at google.com>
>>> # Date 1449092255 21600
>>> #      Wed Dec 02 15:37:35 2015 -0600
>>> # Node ID 1240273d0971ff1d5e596ffea69cf34a07335d92
>>> # Parent  79ad570337631038a08b75957b2890975221e089
> 
> […]
> 
>>> @@ -603,6 +612,11 @@ class cg1packer(object):
>>>             size += len(chunk)
>>>             yield chunk
>>>         self._verbosenote(_('%8.i (manifests)\n') % size)
>>> +        # It looks odd to assert this here, but tmfnodes doesn't get
>>> +        # filled in until after we've called lookuplinknode for
>>> +        # sending root manifests, so the only way to tell the streams
>>> +        # got crossed is to check after we've done all the work.
>>> +        assert not tmfnodes
>>> 
>> I think it would be clearer to override the entire generate() method and add call the cg2packer.generate() from there if not using tree manifests and then dedicate the rest of the method to tree manifests. It would mean more code duplication, but some of that could be extracted to helpers. I tried it and I think it removes a lot of confusion.
> 
> I get where you’re coming from, but generate is a place where we’re likely to do performance work in the future, and I’d like to keep it to one (slightly more complicated) implementation, rather than two that we have to remember to update in parallel.
> 
>>> 
>>> @@ -676,10 +699,26 @@ class cg1packer(object):
>>>                         fclnode = fclnodes.setdefault(n, clnode)
>>>                         if clrevorder[clnode] < clrevorder[fclnode]:
>>>                             fclnodes[n] = clnode
>>> +                        # gather list of changed treemanifest nodes
>>> +                        if '/' in f and ('treemanifest' in
>>> +                                         self._repo.requirements):
>>> +                            dirs = f.split('/')[:-1]
>>> +                            submf = mdata
>>> +                            path = []
>>> +                            for sd in dirs:
>>> +                                submf = submf._dirs[sd + '/']
>> 
>> Using the "private" members in treemanifest directly is a little scary. The class takes care to load the manifest in all public methods, but accessing _dirs directly does not do that. I think it works here because the readfast() method called into readdelta(), which called into _slowreaddelta(), which populates the manifest with a diff. However, if the delta parent is not a parent, readfast() would not call readdelta(), it instead calls read(), which returns an uninitialized (lazily loaded) manifest. Besides the problem of it not being loaded (I *think*; we'd have to verify), it would be bad for narrowhg, which does not want the entire manifest iterated over.
> 
> I agree, I’d rather not use the private method, but do I have a choice? How should I ensure the submf is loaded?
> 
> (I think the submf sure to already be fully loaded because immediately before we do the directory checks we looked at a file along the same path, so I’m not sure we need any extra paranoia today. Should I add a comment to that effect?)
> 
>> 
>> I think cg3packer.generate() should *always* call readdelta() for treemanifests (until we implement the fast path).
> 
> I don’t follow this at all. :(
> 
> 
> [snip]
> 
>> +            if 'treemanifest' not in repo.requirements:
>> +                if not wasempty:
>> +                    raise error.Abort(_(
>> +                        "bundle contains tree manifests, but local repo is "
>> +                        "non-empty and does not use tree manifests"))
>> +                repo.requirements.add('treemanifest')
>> +                repo._applyopenerreqs()
>> +                repo._writerequirements()
>> +                repo.manifest._treeondisk = True
>> +                repo.manifest._treeinmem = True
>> 
>> This would not be necessary if the requirement had been added to the repo before the manifest revlog was accessed. How do we handle this elsewhere? Is this check what we would want the client to run into: https://selenic.com/hg/file/7e1fac6c0a9c/mercurial/exchange.py#l1103
> 
> It’s weird. Basically, clone creates a repo on-disk and then does a pull into it, so it doesn’t turn on tree manifest as a requirement because it doesn’t know until it’s too late. Does that make sense?
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20151203/88d98b81/attachment.pgp>


More information about the Mercurial-devel mailing list