[PATCH] discovery: slowly increase sampling size

Tue May 21 16:00:26 EDT 2019

On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk <
martinvonz at google.com> wrote:

>
>
> On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David <
> pierre-yves.david at ens-lyon.org> wrote:
>
>>
>>
>> On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
>> >
>> >
>> > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
>> > <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
>>
>> > wrote:
>> >
>> >     # HG changeset patch
>> >     # User Pierre-Yves David <pierre-yves.david at octobus.net
>> >     <mailto:pierre-yves.david at octobus.net>>
>> >     # Date 1558436902 -7200
>> >     #      Tue May 21 13:08:22 2019 +0200
>> >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
>> >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
>> >     # EXP-Topic discovery
>> >     # Available At https://bitbucket.org/octobus/mercurial-devel/
>> >     #              hg pull
>> >     https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
>> >     discovery: slowly increase sampling size
>> >
>> >     Some pathological discovery runs can requires many roundtrip. When
>> >     this happens
>> >     things can get very slow.
>> >
>> >     To make the algorithm more resilience again such pathological case.
>> >     We slowly
>> >     increase the sample size with each roundtrip (+5%). This will have a
>> >     negligible
>> >     impact on "normal" discovery with few roundtrips, but a large
>> >     positive impact of
>> >     case with many roundtrips. Asking more question per roundtrip helps
>> >     to reduce
>> >     the undecided set faster. Instead of reducing the undecided set a
>> >     linear speed
>> >     (in the worst case), we reduce it as a guaranteed (small)
>> >     exponential rate. The
>> >     data below show this slow ramp up in sample size:
>> >
>> >     round trip    |    1 |     5 |    10 |    20 |     50 |     100 |
>> >         130 |
>> >     sample size   |  200 |   254 |   321 |   517 |  2 199 |  25 123 |
>> >       108 549 |
>> >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2
>> >     276 755 |
>> >
>> >     To be a bit more concrete, lets take a very pathological case as an
>> >     example. We
>> >     are doing discovery from a copy of Mozilla-try to a more recent
>> >     version of
>> >     mozilla-unified. Mozilla-unified heads are unknown to the
>> >     mozilla-try repo and
>> >     there are over 1 million "missing" changesets. (the discovery is
>> >     "local" to
>> >     avoid network interference)
>> >
>> >
>> > What makes it slow in that case? If the 1 million changesets are mostly
>> > linear, it should still be fast, right? So I assume they're spread out
>> > across branches. Is it slow because the remote side has many branches
>> > that the local side does not, or the other way around?
>>
>> The million changeset are on many different branches. (the discovery is
>> run from mozilla-try (many changesets, tree like) toward mozilla-unified
>> (mostly linear).
>>
>
> I think that means that there are many local heads. Maybe it's the case I
> tried to address with https://phab.mercurial-scm.org/D2647. I just
> re-opened that. I'll see if can get timing numbers for the repos you
> mentioned here.
>

That's now done. I've added you as reviewer. I feel like that's a better
solution for this particular case. It results in significantly more bytes
transferred (+25%), but it's also much faster.

>
>>
>> (Note that the point of this test is not so much of looking into how to
>> make that scenario faster, but to show a bad scenario and show how a
>> generic safety net makes its more bearable)
>>
>> >
>> >
>> >     Without this change, the discovery:
>> >     - last 1858 seconds (31 minutes),
>> >     - does 1700 round trip,
>> >     - asking about 340 000 nodes.
>> >
>> >     With this change, the discovery:
>> >     - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
>> >     - does 94 round trip (-94%),
>> >     - asking about 344 211 nodes (+1%).
>> >
>> >     Of course, this is an extreme case (and 3 minutes is still slow).
>> >     However this
>> >     give a good example of how this sample size increase act as a
>> safety net
>> >     catching any bad situations.
>> >
>> >     We could image a steeper increase than 5%. For example 10% would
>> >     give the
>> >     following number:
>> >
>> >     round trip    |    1 |     5 |    10 |     20 |     50  |        75
>> >     |        100 |
>> >     sample size   |  200 |   321 |   514 |  1 326 |  23 060 |   249 812
>> >     |  2 706 594 |
>> >     covered nodes |  200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254
>> >     | 29 770 966 |
>> >
>> >     In parallel, it is useful to understand these pathological cases and
>> >     improve
>> >     them. However the current change provides a general purpose safety
>> >     net to smooth
>> >     the impact of pathological cases.
>> >
>> >     To avoid issue with older http server, the increase in sample size
>> >     only occurs
>> >     if the protocol has not limit on command argument size.
>> >
>> >     diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
>> >     --- a/mercurial/setdiscovery.py
>> >     +++ b/mercurial/setdiscovery.py
>> >     @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
>> >                           initialsamplesize=100,
>> >                           fullsamplesize=200,
>> >                           abortwhenunrelated=True,
>> >     -                    ancestorsof=None):
>> >     +                    ancestorsof=None,
>> >     +                    samplegrowth=1.05):
>> >           '''Return a tuple (common, anyincoming, remoteheads) used to
>> >     identify
>> >           missing nodes from or in remote.
>> >           '''
>> >     @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
>> >                       ui.debug("taking initial sample\n")
>> >                   samplefunc = disco.takefullsample
>> >                   targetsize = fullsamplesize
>> >     +            if not remote.limitedarguments:
>> >     +                fullsamplesize = int(fullsamplesize * samplegrowth)
>> >               else:
>> >                   # use even cheaper initial sample
>> >                   ui.debug("taking quick initial sample\n")
>> >     diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
>> >     --- a/tests/test-setdiscovery.t
>> >     +++ b/tests/test-setdiscovery.t
>> >     @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
>> >         query 3; still undecided: 980, sample size is: 200
>> >         sampling from both directions
>> >         searching: 4 queries
>> >     -  query 4; still undecided: \d+, sample size is: 200 (re)
>> >     +  query 4; still undecided: 435, sample size is: 210
>> >         sampling from both directions
>> >         searching: 5 queries
>> >     -  query 5; still undecided: 195, sample size is: 195
>> >     +  query 5; still undecided: 185, sample size is: 185
>> >         5 total queries in *.????s (glob)
>> >         elapsed time:  * seconds (glob)
>> >         heads summary:
>> >
>>
>> --
>> Pierre-Yves David
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190521/f20a7874/attachment.html>