[PATCH] discovery: slowly increase sampling size
Martin von Zweigbergk
martinvonz at google.com
Tue May 21 16:40:38 EDT 2019
On Tue, May 21, 2019 at 1:24 PM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:
>
>
> On 5/21/19 10:00 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk
> > <martinvonz at google.com <mailto:martinvonz at google.com>> wrote:
> >
> >
> >
> > On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David
> > <pierre-yves.david at ens-lyon.org
> > <mailto:pierre-yves.david at ens-lyon.org>> wrote:
> >
> >
> >
> > On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
> > >
> > >
> > > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
> > > <pierre-yves.david at ens-lyon.org
> > <mailto:pierre-yves.david at ens-lyon.org>
> > <mailto:pierre-yves.david at ens-lyon.org
> > <mailto:pierre-yves.david at ens-lyon.org>>>
> > > wrote:
> > >
> > > # HG changeset patch
> > > # User Pierre-Yves David <pierre-yves.david at octobus.net
> > <mailto:pierre-yves.david at octobus.net>
> > > <mailto:pierre-yves.david at octobus.net
> > <mailto:pierre-yves.david at octobus.net>>>
> > > # Date 1558436902 -7200
> > > # Tue May 21 13:08:22 2019 +0200
> > > # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> > > # Parent 86f17fc31aa8a0c26c11db6a532293463ee6428a
> > > # EXP-Topic discovery
> > > # Available At
> https://bitbucket.org/octobus/mercurial-devel/
> > > # hg pull
> > > https://bitbucket.org/octobus/mercurial-devel/ -r
> 123267b121ea
> > > discovery: slowly increase sampling size
> > >
> > > Some pathological discovery runs can requires many
> > roundtrip. When
> > > this happens
> > > things can get very slow.
> > >
> > > To make the algorithm more resilience again such
> > pathological case.
> > > We slowly
> > > increase the sample size with each roundtrip (+5%). This
> > will have a
> > > negligible
> > > impact on "normal" discovery with few roundtrips, but a
> large
> > > positive impact of
> > > case with many roundtrips. Asking more question per
> > roundtrip helps
> > > to reduce
> > > the undecided set faster. Instead of reducing the
> > undecided set a
> > > linear speed
> > > (in the worst case), we reduce it as a guaranteed (small)
> > > exponential rate. The
> > > data below show this slow ramp up in sample size:
> > >
> > > round trip | 1 | 5 | 10 | 20 | 50 |
> > 100 |
> > > 130 |
> > > sample size | 200 | 254 | 321 | 517 | 2 199 |
> > 25 123 |
> > > 108 549 |
> > > covered nodes | 200 | 1 357 | 2 821 | 7 031 | 42 658 |
> > 524 530 | 2
> > > 276 755 |
> > >
> > > To be a bit more concrete, lets take a very pathological
> > case as an
> > > example. We
> > > are doing discovery from a copy of Mozilla-try to a more
> > recent
> > > version of
> > > mozilla-unified. Mozilla-unified heads are unknown to the
> > > mozilla-try repo and
> > > there are over 1 million "missing" changesets. (the
> > discovery is
> > > "local" to
> > > avoid network interference)
> > >
> > >
> > > What makes it slow in that case? If the 1 million changesets
> > are mostly
> > > linear, it should still be fast, right? So I assume they're
> > spread out
> > > across branches. Is it slow because the remote side has many
> > branches
> > > that the local side does not, or the other way around?
> >
> > The million changeset are on many different branches. (the
> > discovery is
> > run from mozilla-try (many changesets, tree like) toward
> > mozilla-unified
> > (mostly linear).
> >
> >
> > I think that means that there are many local heads. Maybe it's the
> > case I tried to address with https://phab.mercurial-scm.org/D2647. I
> > just re-opened that. I'll see if can get timing numbers for the
> > repos you mentioned here.
>
> I had a look at this commit. The idea in your commit message is
> interesting, but I see a different one implemented in the code. See my
> comment for details.
>
> > That's now done. I've added you as reviewer. I feel like that's a better
> > solution for this particular case. It results in significantly more
> > bytes transferred (+25%), but it's also much faster.
>
> As said before in the scope of this series, I am not interest in any
> better solution any particular case. I am working on a general purpose
> safety net that improves the upper bound round trip complexity for every
> pathological cases.
>
> The good point of this safety net is that it is not mutually exclusive
> with improving for particular cases. For example, you change and my
> change could peacefully coexist.
>
> Can we move forward with this patch before discussing more specific change?
>
I'm not sure what other pathological cases remain after that one is fixed,
but sure, I'll queue this anyway.
>
> Cheers,
>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190521/fdaf53e5/attachment.html>
More information about the Mercurial-devel
mailing list