[PATCH 1 of 2] bundle: don't send too many changesets (Issue1704)

Peter Arrenbrecht peter.arrenbrecht at gmail.com
Sat Nov 7 05:25:49 CST 2009


On Sat, Nov 7, 2009 at 12:08 PM, Benoit Boissinot <bboissin at gmail.com> wrote:
> On Sat, Nov 07, 2009 at 09:59:13AM +0100, Peter Arrenbrecht wrote:
>> # HG changeset patch
>> # User Peter Arrenbrecht <peter.arrenbrecht at gmail.com>
>> # Date 1257583794 -3600
>> bundle: don't send too many changesets (Issue1704)
>>
>> The fast path in changegroupsubset can send too many csets. This happens
>> because it uses the parents of all bases as common nodes and then goes
>> forward from this again. If a base has a parent that has another child,
>> which is -not- a base, then this other child will nevertheless end up in
>> the changegroup.
>>
>> The fix is to not use findmissing(), but use nodesbetween() instead, as
>> do the slow path and incoming/outgoing.
>
> If that's the case, then findmissing() should be fixed, since it does
> exactly what you are describing (and since prepush() might be affected
> too).
> From your description it's unclear if it's a bug in the "common" nodes
> computation or a bug in findmissing() (ie. how does common look like
> when calling _changegroup() for the fast-path).

Hmm. Then we need to be more clever when computing the set of common
nodes. Just taking all the parents of the desired base nodes is not
good enough. Example:

  a -> (b, c -> d -> e) -> f

Now if we want the changegroup with bases [b] and heads [f], we'd have
to be smart enough to realize that common should be [a, e]. Meaning we
have to scan children of the common nodes we added (here: [b,c]) and
for all such children not in the original bases (here: c), we have to
add their local heads (here: e) with within the subgraph formed by the
original bases and heads to the set of common nodes.

This seems rather intricate when nodesbetween already exists and does
the job nicely.

>>
>> The change to test-notify.out is correct, because it actually hits this
>> bug, as can be seen by glog'ing the two repos:
>>
>> @    22c88
>> |\
>> | o  0a184
>> | |
>> o |  0647d
>> |/
>> o  cb9a9
>>
>> and
>>
>> o  0647d
>> |
>> @  cb9a9
>>
>> It used to pull 0647d again, which is unnecessary.
>
> Indeed, so the question is: was common composed of 0647d, or cb9a9 or
> both?
>
> According to the docstring in findmissing() if common includes 0647d,
> then findmissing() should return only [0a184, 22c88].
>
> I'd actually like to have our internals based on "common" nodes instead
> of "bases" nodes since it's a lot easier to reason about them (that's
> why I introduced findmissing()). Ideally a new version of the discovery
> algorithm should compute the "common", and send it to a new
> remote changegroupcommon().
>
>>
>> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>> --- a/mercurial/localrepo.py
>> +++ b/mercurial/localrepo.py
>> @@ -1622,19 +1623,22 @@
>>          the linkrev.
>>          """
>>
>> +        # Set up some initial variables
>> +        # Make it easy to refer to self.changelog
>> +        cl = self.changelog
>> +        # msng is short for missing - compute the list of changesets in this
>> +        # changegroup.
>
>> +        if not bases:
>> +            bases = [nullid]
>
> not needed

Not so. Without it, I get a failure with test-clone-cgi (which
admittedly calls changegroup in a slightly special way).

>> +        msng_cl_lst, bases, heads = cl.nodesbetween(bases, heads)
>> +
>>          if extranodes is None:
>>              # can we go through the fast path ?
>>              heads.sort()
>>              allheads = self.heads()
>>              allheads.sort()
>>              if heads == allheads:
>> -                common = []
>> -                # parents of bases are known from both sides
>> -                for n in bases:
>> -                    for p in self.changelog.parents(n):
>> -                        if p != nullid:
>> -                            common.append(p)
>> -                return self._changegroup(common, source)
>> +                return self._changegroup(msng_cl_lst, source)
>>
>>          self.hook('preoutgoing', throw=True, source=source)
>>
>
> Can't you remove the nodesbetween() call that's right after this, since
> you already computed it?

Doh. I meant to move this bit of code, not duplicate it. Shall fix if
you give me the OK for the general approach.

-parren



More information about the Mercurial-devel mailing list