[PATCH] discovery: slowly increase sampling size

Tue May 21 16:40:38 EDT 2019

On Tue, May 21, 2019 at 1:24 PM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 5/21/19 10:00 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk
> > <martinvonz at google.com <mailto:martinvonz at google.com>> wrote:
> >
> >
> >
> >     On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David
> >     <pierre-yves.david at ens-lyon.org
> >     <mailto:pierre-yves.david at ens-lyon.org>> wrote:
> >
> >
> >
> >         On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
> >          >
> >          >
> >          > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
> >          > <pierre-yves.david at ens-lyon.org
> >         <mailto:pierre-yves.david at ens-lyon.org>
> >         <mailto:pierre-yves.david at ens-lyon.org
> >         <mailto:pierre-yves.david at ens-lyon.org>>>
> >          > wrote:
> >          >
> >          >     # HG changeset patch
> >          >     # User Pierre-Yves David <pierre-yves.david at octobus.net
> >         <mailto:pierre-yves.david at octobus.net>
> >          >     <mailto:pierre-yves.david at octobus.net
> >         <mailto:pierre-yves.david at octobus.net>>>
> >          >     # Date 1558436902 -7200
> >          >     #      Tue May 21 13:08:22 2019 +0200
> >          >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> >          >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
> >          >     # EXP-Topic discovery
> >          >     # Available At
> https://bitbucket.org/octobus/mercurial-devel/
> >          >     #              hg pull
> >          > https://bitbucket.org/octobus/mercurial-devel/ -r
> 123267b121ea
> >          >     discovery: slowly increase sampling size
> >          >
> >          >     Some pathological discovery runs can requires many
> >         roundtrip. When
> >          >     this happens
> >          >     things can get very slow.
> >          >
> >          >     To make the algorithm more resilience again such
> >         pathological case.
> >          >     We slowly
> >          >     increase the sample size with each roundtrip (+5%). This
> >         will have a
> >          >     negligible
> >          >     impact on "normal" discovery with few roundtrips, but a
> large
> >          >     positive impact of
> >          >     case with many roundtrips. Asking more question per
> >         roundtrip helps
> >          >     to reduce
> >          >     the undecided set faster. Instead of reducing the
> >         undecided set a
> >          >     linear speed
> >          >     (in the worst case), we reduce it as a guaranteed (small)
> >          >     exponential rate. The
> >          >     data below show this slow ramp up in sample size:
> >          >
> >          >     round trip    |    1 |     5 |    10 |    20 |     50 |
> >             100 |
> >          >         130 |
> >          >     sample size   |  200 |   254 |   321 |   517 |  2 199 |
> >         25 123 |
> >          >       108 549 |
> >          >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 |
> >         524 530 | 2
> >          >     276 755 |
> >          >
> >          >     To be a bit more concrete, lets take a very pathological
> >         case as an
> >          >     example. We
> >          >     are doing discovery from a copy of Mozilla-try to a more
> >         recent
> >          >     version of
> >          >     mozilla-unified. Mozilla-unified heads are unknown to the
> >          >     mozilla-try repo and
> >          >     there are over 1 million "missing" changesets. (the
> >         discovery is
> >          >     "local" to
> >          >     avoid network interference)
> >          >
> >          >
> >          > What makes it slow in that case? If the 1 million changesets
> >         are mostly
> >          > linear, it should still be fast, right? So I assume they're
> >         spread out
> >          > across branches. Is it slow because the remote side has many
> >         branches
> >          > that the local side does not, or the other way around?
> >
> >         The million changeset are on many different branches. (the
> >         discovery is
> >         run from mozilla-try (many changesets, tree like) toward
> >         mozilla-unified
> >         (mostly linear).
> >
> >
> >     I think that means that there are many local heads. Maybe it's the
> >     case I tried to address with https://phab.mercurial-scm.org/D2647. I
> >     just re-opened that. I'll see if can get timing numbers for the
> >     repos you mentioned here.
>
> I had a look at this commit. The idea in your commit message is
> interesting, but I see a different one implemented in the code. See my
> comment for details.
>
> > That's now done. I've added you as reviewer. I feel like that's a better
> > solution for this particular case. It results in significantly more
> > bytes transferred (+25%), but it's also much faster.
>
> As said before in the scope of this series, I am not interest in any
> better solution any particular case. I am working on a general purpose
> safety net that improves the upper bound round trip complexity for every
> pathological cases.
>
> The good point of this safety net is that it is not mutually exclusive
> with improving for particular cases. For example, you change and my
> change could peacefully coexist.
>
> Can we move forward with this patch before discussing more specific change?
>

I'm not sure what other pathological cases remain after that one is fixed,
but sure, I'll queue this anyway.

>
> Cheers,
>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190521/fdaf53e5/attachment.html>