[PATCH] discovery: slowly increase sampling size
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Tue May 21 13:09:47 EDT 2019
On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
>
>
> On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
> wrote:
>
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at octobus.net
> <mailto:pierre-yves.david at octobus.net>>
> # Date 1558436902 -7200
> # Tue May 21 13:08:22 2019 +0200
> # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> # Parent 86f17fc31aa8a0c26c11db6a532293463ee6428a
> # EXP-Topic discovery
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> # hg pull
> https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
> discovery: slowly increase sampling size
>
> Some pathological discovery runs can requires many roundtrip. When
> this happens
> things can get very slow.
>
> To make the algorithm more resilience again such pathological case.
> We slowly
> increase the sample size with each roundtrip (+5%). This will have a
> negligible
> impact on "normal" discovery with few roundtrips, but a large
> positive impact of
> case with many roundtrips. Asking more question per roundtrip helps
> to reduce
> the undecided set faster. Instead of reducing the undecided set a
> linear speed
> (in the worst case), we reduce it as a guaranteed (small)
> exponential rate. The
> data below show this slow ramp up in sample size:
>
> round trip | 1 | 5 | 10 | 20 | 50 | 100 |
> 130 |
> sample size | 200 | 254 | 321 | 517 | 2 199 | 25 123 |
> 108 549 |
> covered nodes | 200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2
> 276 755 |
>
> To be a bit more concrete, lets take a very pathological case as an
> example. We
> are doing discovery from a copy of Mozilla-try to a more recent
> version of
> mozilla-unified. Mozilla-unified heads are unknown to the
> mozilla-try repo and
> there are over 1 million "missing" changesets. (the discovery is
> "local" to
> avoid network interference)
>
>
> What makes it slow in that case? If the 1 million changesets are mostly
> linear, it should still be fast, right? So I assume they're spread out
> across branches. Is it slow because the remote side has many branches
> that the local side does not, or the other way around?
The million changeset are on many different branches. (the discovery is
run from mozilla-try (many changesets, tree like) toward mozilla-unified
(mostly linear).
(Note that the point of this test is not so much of looking into how to
make that scenario faster, but to show a bad scenario and show how a
generic safety net makes its more bearable)
>
>
> Without this change, the discovery:
> - last 1858 seconds (31 minutes),
> - does 1700 round trip,
> - asking about 340 000 nodes.
>
> With this change, the discovery:
> - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
> - does 94 round trip (-94%),
> - asking about 344 211 nodes (+1%).
>
> Of course, this is an extreme case (and 3 minutes is still slow).
> However this
> give a good example of how this sample size increase act as a safety net
> catching any bad situations.
>
> We could image a steeper increase than 5%. For example 10% would
> give the
> following number:
>
> round trip | 1 | 5 | 10 | 20 | 50 | 75
> | 100 |
> sample size | 200 | 321 | 514 | 1 326 | 23 060 | 249 812
> | 2 706 594 |
> covered nodes | 200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254
> | 29 770 966 |
>
> In parallel, it is useful to understand these pathological cases and
> improve
> them. However the current change provides a general purpose safety
> net to smooth
> the impact of pathological cases.
>
> To avoid issue with older http server, the increase in sample size
> only occurs
> if the protocol has not limit on command argument size.
>
> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
> initialsamplesize=100,
> fullsamplesize=200,
> abortwhenunrelated=True,
> - ancestorsof=None):
> + ancestorsof=None,
> + samplegrowth=1.05):
> '''Return a tuple (common, anyincoming, remoteheads) used to
> identify
> missing nodes from or in remote.
> '''
> @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
> ui.debug("taking initial sample\n")
> samplefunc = disco.takefullsample
> targetsize = fullsamplesize
> + if not remote.limitedarguments:
> + fullsamplesize = int(fullsamplesize * samplegrowth)
> else:
> # use even cheaper initial sample
> ui.debug("taking quick initial sample\n")
> diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
> --- a/tests/test-setdiscovery.t
> +++ b/tests/test-setdiscovery.t
> @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
> query 3; still undecided: 980, sample size is: 200
> sampling from both directions
> searching: 4 queries
> - query 4; still undecided: \d+, sample size is: 200 (re)
> + query 4; still undecided: 435, sample size is: 210
> sampling from both directions
> searching: 5 queries
> - query 5; still undecided: 195, sample size is: 195
> + query 5; still undecided: 185, sample size is: 185
> 5 total queries in *.????s (glob)
> elapsed time: * seconds (glob)
> heads summary:
>
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list