[PATCH 1 of 2] bundle: don't send too many changesets (Issue1704)

Peter Arrenbrecht peter.arrenbrecht at gmail.com
Sat Nov 7 07:55:13 CST 2009


On Saturday, November 7, 2009, Peter Arrenbrecht
<peter.arrenbrecht at gmail.com> wrote:
> On Sat, Nov 7, 2009 at 12:08 PM, Benoit Boissinot <bboissin at gmail.com> wrote:
>> On Sat, Nov 07, 2009 at 09:59:13AM +0100, Peter Arrenbrecht wrote:
>>> # HG changeset patch
>>> # User Peter Arrenbrecht <peter.arrenbrecht at gmail.com>
>>> # Date 1257583794 -3600
>>> bundle: don't send too many changesets (Issue1704)
>>>
>>> The fast path in changegroupsubset can send too many csets. This happens
>>> because it uses the parents of all bases as common nodes and then goes
>>> forward from this again. If a base has a parent that has another child,
>>> which is -not- a base, then this other child will nevertheless end up in
>>> the changegroup.
>>>
>>> The fix is to not use findmissing(), but use nodesbetween() instead, as
>>> do the slow path and incoming/outgoing.
>>
>> If that's the case, then findmissing() should be fixed, since it does
>> exactly what you are describing (and since prepush() might be affected
>> too).
>> From your description it's unclear if it's a bug in the "common" nodes
>> computation or a bug in findmissing() (ie. how does common look like
>> when calling _changegroup() for the fast-path).
>
> Hmm. Then we need to be more clever when computing the set of common
> nodes. Just taking all the parents of the desired base nodes is not
> good enough. Example:
>
>   a -> (b, c -> d -> e) -> f
>
> Now if we want the changegroup with bases [b] and heads [f], we'd have
> to be smart enough to realize that common should be [a, e]. Meaning we
> have to scan children of the common nodes we added (here: [b,c]) and
> for all such children not in the original bases (here: c), we have to
> add their local heads (here: e) with within the subgraph formed by the
> original bases and heads to the set of common nodes.
>
> This seems rather intricate when nodesbetween already exists and does
> the job nicely.
>

In the end it seemed to me we would be very nearly reimplementing
nodesbetween to get the common set right.
-parren

>>>
>>> The change to test-notify.out is correct, because it actually hits this
>>> bug, as can be seen by glog'ing the two repos:
>>>
>>> @    22c88
>>> |\
>>> | o  0a184
>>> | |
>>> o |  0647d
>>> |/
>>> o  cb9a9
>>>
>>> and
>>>
>>> o  0647d
>>> |
>>> @  cb9a9
>>>
>>> It used to pull 0647d again, which is unnecessary.
>>
>> Indeed, so the question is: was common composed of 0647d, or cb9a9 or
>> both?
>>
>> According to the docstring in findmissing() if common includes 0647d,
>> then findmissing() should return only [0a184, 22c88].
>>
>> I'd actually like to have our internals based on "common" nodes instead
>> of "bases" nodes since it's a lot easier to reason about them (that's
>> why I introduced findmissing()). Ideally a new version of the discovery
>> algorithm should compute the "common", and send it to a new
>> remote changegroupcommon().
>>
>>>
>>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>>> --- a/mercurial/localrepo.py
>>> +++ b/mercurial/localrepo.py
>>> @@ -1622,19 +1623,22 @@
>>>          the linkrev.
>>>          """
>>>
>>> +        # Set up some initial variables
>>> +        # Make it easy to refer to self.changelog
>>> +        cl = self.changelog
>>> +        # msng is short for missing - compute the list of changesets in this
>>> +        # changegroup.
>>
>>> +        if not bases:
>>> +            bases = [nullid]
>>
>> not needed
>
> Not so. Without it, I get a failure with test-clone-cgi (which
> admittedly calls changegroup in a slightly special way).
>
>>> +        msng_cl_lst, bases, heads = cl.nodesbetween(bases, heads)
>>> +
>>>          if extranodes is None:
>>>              # can we go through the fast path ?
>>>              heads.sort()
>>>              allheads = self.heads()
>>>              allheads.sort()
>>>              if heads == allheads:
>>> -                common = []
>>> -                # parents of bases are known from both sides
>>> -                for n in bases:
>>> -                    for p in self.changelog.parents(n):
>>> -                        if p != nullid:
>>> -                            common.append(p)
>>> -                return self._changegroup(common, source)
>>> +                return self._changegroup(msng_cl_lst, source)
>>>
>>>          self.hook('preoutgoing', throw=True, source=source)
>>>
>>
>> Can't you remove the nodesbetween() call that's right after this, since
>> you already computed it?
>
> Doh. I meant to move this bit of code, not duplicate it. Shall fix if
> you give me the OK for the general approach.
>
> -parren
>



More information about the Mercurial-devel mailing list