[PATCH] discovery: slowly increase sampling size
Martin von Zweigbergk
martinvonz at google.com
Tue May 21 14:10:18 EDT 2019
On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:
>
>
> On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
> > <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
>
> > wrote:
> >
> > # HG changeset patch
> > # User Pierre-Yves David <pierre-yves.david at octobus.net
> > <mailto:pierre-yves.david at octobus.net>>
> > # Date 1558436902 -7200
> > # Tue May 21 13:08:22 2019 +0200
> > # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> > # Parent 86f17fc31aa8a0c26c11db6a532293463ee6428a
> > # EXP-Topic discovery
> > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > # hg pull
> > https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
> > discovery: slowly increase sampling size
> >
> > Some pathological discovery runs can requires many roundtrip. When
> > this happens
> > things can get very slow.
> >
> > To make the algorithm more resilience again such pathological case.
> > We slowly
> > increase the sample size with each roundtrip (+5%). This will have a
> > negligible
> > impact on "normal" discovery with few roundtrips, but a large
> > positive impact of
> > case with many roundtrips. Asking more question per roundtrip helps
> > to reduce
> > the undecided set faster. Instead of reducing the undecided set a
> > linear speed
> > (in the worst case), we reduce it as a guaranteed (small)
> > exponential rate. The
> > data below show this slow ramp up in sample size:
> >
> > round trip | 1 | 5 | 10 | 20 | 50 | 100 |
> > 130 |
> > sample size | 200 | 254 | 321 | 517 | 2 199 | 25 123 |
> > 108 549 |
> > covered nodes | 200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2
> > 276 755 |
> >
> > To be a bit more concrete, lets take a very pathological case as an
> > example. We
> > are doing discovery from a copy of Mozilla-try to a more recent
> > version of
> > mozilla-unified. Mozilla-unified heads are unknown to the
> > mozilla-try repo and
> > there are over 1 million "missing" changesets. (the discovery is
> > "local" to
> > avoid network interference)
> >
> >
> > What makes it slow in that case? If the 1 million changesets are mostly
> > linear, it should still be fast, right? So I assume they're spread out
> > across branches. Is it slow because the remote side has many branches
> > that the local side does not, or the other way around?
>
> The million changeset are on many different branches. (the discovery is
> run from mozilla-try (many changesets, tree like) toward mozilla-unified
> (mostly linear).
>
I think that means that there are many local heads. Maybe it's the case I
tried to address with https://phab.mercurial-scm.org/D2647. I just
re-opened that. I'll see if can get timing numbers for the repos you
mentioned here.
>
> (Note that the point of this test is not so much of looking into how to
> make that scenario faster, but to show a bad scenario and show how a
> generic safety net makes its more bearable)
>
> >
> >
> > Without this change, the discovery:
> > - last 1858 seconds (31 minutes),
> > - does 1700 round trip,
> > - asking about 340 000 nodes.
> >
> > With this change, the discovery:
> > - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
> > - does 94 round trip (-94%),
> > - asking about 344 211 nodes (+1%).
> >
> > Of course, this is an extreme case (and 3 minutes is still slow).
> > However this
> > give a good example of how this sample size increase act as a safety
> net
> > catching any bad situations.
> >
> > We could image a steeper increase than 5%. For example 10% would
> > give the
> > following number:
> >
> > round trip | 1 | 5 | 10 | 20 | 50 | 75
> > | 100 |
> > sample size | 200 | 321 | 514 | 1 326 | 23 060 | 249 812
> > | 2 706 594 |
> > covered nodes | 200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254
> > | 29 770 966 |
> >
> > In parallel, it is useful to understand these pathological cases and
> > improve
> > them. However the current change provides a general purpose safety
> > net to smooth
> > the impact of pathological cases.
> >
> > To avoid issue with older http server, the increase in sample size
> > only occurs
> > if the protocol has not limit on command argument size.
> >
> > diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> > --- a/mercurial/setdiscovery.py
> > +++ b/mercurial/setdiscovery.py
> > @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
> > initialsamplesize=100,
> > fullsamplesize=200,
> > abortwhenunrelated=True,
> > - ancestorsof=None):
> > + ancestorsof=None,
> > + samplegrowth=1.05):
> > '''Return a tuple (common, anyincoming, remoteheads) used to
> > identify
> > missing nodes from or in remote.
> > '''
> > @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
> > ui.debug("taking initial sample\n")
> > samplefunc = disco.takefullsample
> > targetsize = fullsamplesize
> > + if not remote.limitedarguments:
> > + fullsamplesize = int(fullsamplesize * samplegrowth)
> > else:
> > # use even cheaper initial sample
> > ui.debug("taking quick initial sample\n")
> > diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
> > --- a/tests/test-setdiscovery.t
> > +++ b/tests/test-setdiscovery.t
> > @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
> > query 3; still undecided: 980, sample size is: 200
> > sampling from both directions
> > searching: 4 queries
> > - query 4; still undecided: \d+, sample size is: 200 (re)
> > + query 4; still undecided: 435, sample size is: 210
> > sampling from both directions
> > searching: 5 queries
> > - query 5; still undecided: 195, sample size is: 195
> > + query 5; still undecided: 185, sample size is: 185
> > 5 total queries in *.????s (glob)
> > elapsed time: * seconds (glob)
> > heads summary:
> >
>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190521/55dfd2e1/attachment.html>
More information about the Mercurial-devel
mailing list