[PATCH] discovery: slowly increase sampling size

Pierre-Yves David pierre-yves.david at ens-lyon.org
Tue May 21 16:24:28 EDT 2019



On 5/21/19 10:00 PM, Martin von Zweigbergk wrote:
> 
> 
> On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk 
> <martinvonz at google.com <mailto:martinvonz at google.com>> wrote:
> 
> 
> 
>     On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David
>     <pierre-yves.david at ens-lyon.org
>     <mailto:pierre-yves.david at ens-lyon.org>> wrote:
> 
> 
> 
>         On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
>          >
>          >
>          > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
>          > <pierre-yves.david at ens-lyon.org
>         <mailto:pierre-yves.david at ens-lyon.org>
>         <mailto:pierre-yves.david at ens-lyon.org
>         <mailto:pierre-yves.david at ens-lyon.org>>>
>          > wrote:
>          >
>          >     # HG changeset patch
>          >     # User Pierre-Yves David <pierre-yves.david at octobus.net
>         <mailto:pierre-yves.david at octobus.net>
>          >     <mailto:pierre-yves.david at octobus.net
>         <mailto:pierre-yves.david at octobus.net>>>
>          >     # Date 1558436902 -7200
>          >     #      Tue May 21 13:08:22 2019 +0200
>          >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
>          >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
>          >     # EXP-Topic discovery
>          >     # Available At https://bitbucket.org/octobus/mercurial-devel/
>          >     #              hg pull
>          > https://bitbucket.org/octobus/mercurial-devel/ -r 123267b121ea
>          >     discovery: slowly increase sampling size
>          >
>          >     Some pathological discovery runs can requires many
>         roundtrip. When
>          >     this happens
>          >     things can get very slow.
>          >
>          >     To make the algorithm more resilience again such
>         pathological case.
>          >     We slowly
>          >     increase the sample size with each roundtrip (+5%). This
>         will have a
>          >     negligible
>          >     impact on "normal" discovery with few roundtrips, but a large
>          >     positive impact of
>          >     case with many roundtrips. Asking more question per
>         roundtrip helps
>          >     to reduce
>          >     the undecided set faster. Instead of reducing the
>         undecided set a
>          >     linear speed
>          >     (in the worst case), we reduce it as a guaranteed (small)
>          >     exponential rate. The
>          >     data below show this slow ramp up in sample size:
>          >
>          >     round trip    |    1 |     5 |    10 |    20 |     50 | 
>             100 |
>          >         130 |
>          >     sample size   |  200 |   254 |   321 |   517 |  2 199 | 
>         25 123 |
>          >       108 549 |
>          >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 |
>         524 530 | 2
>          >     276 755 |
>          >
>          >     To be a bit more concrete, lets take a very pathological
>         case as an
>          >     example. We
>          >     are doing discovery from a copy of Mozilla-try to a more
>         recent
>          >     version of
>          >     mozilla-unified. Mozilla-unified heads are unknown to the
>          >     mozilla-try repo and
>          >     there are over 1 million "missing" changesets. (the
>         discovery is
>          >     "local" to
>          >     avoid network interference)
>          >
>          >
>          > What makes it slow in that case? If the 1 million changesets
>         are mostly
>          > linear, it should still be fast, right? So I assume they're
>         spread out
>          > across branches. Is it slow because the remote side has many
>         branches
>          > that the local side does not, or the other way around?
> 
>         The million changeset are on many different branches. (the
>         discovery is
>         run from mozilla-try (many changesets, tree like) toward
>         mozilla-unified
>         (mostly linear).
> 
> 
>     I think that means that there are many local heads. Maybe it's the
>     case I tried to address with https://phab.mercurial-scm.org/D2647. I
>     just re-opened that. I'll see if can get timing numbers for the
>     repos you mentioned here.

I had a look at this commit. The idea in your commit message is 
interesting, but I see a different one implemented in the code. See my 
comment for details.

> That's now done. I've added you as reviewer. I feel like that's a better 
> solution for this particular case. It results in significantly more 
> bytes transferred (+25%), but it's also much faster.

As said before in the scope of this series, I am not interest in any 
better solution any particular case. I am working on a general purpose 
safety net that improves the upper bound round trip complexity for every 
pathological cases.

The good point of this safety net is that it is not mutually exclusive 
with improving for particular cases. For example, you change and my 
change could peacefully coexist.

Can we move forward with this patch before discussing more specific change?

Cheers,

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list