[PATCH 2 of 2] setdiscovery: stop limiting the number of local head we initially send

Tue Apr 16 18:57:45 EDT 2019

(email sent on Pulkit Goyal suggestion)

It would help us (Octobus) if this patch can make it into 5.0.

This is a small code change that remove an embarrassing slow down in 
case where they would expect Mercurial to be snappy. That kind of 
discovery performance boosts make a significant difference on our side.

Cheers,

On 4/16/19 7:24 PM, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at octobus.net>
> # Date 1555428398 -7200
> #      Tue Apr 16 17:26:38 2019 +0200
> # Node ID ffaa98def33a903f132ec4177d36823a741b6ef6
> # Parent  017778a4463a8e6ecb4b17cacf46a3ab27bdb239
> # EXP-Topic discovery-speedup
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r ffaa98def33a
> setdiscovery: stop limiting the number of local head we initially send
> 
> In our testing this limitation provides now real gain and instead triggers
> pathological discovery timing for some repository with many heads.
> 
> See inline documentation for details.
> 
> Some timing below:
> 
> Mozilla try repository, (~1M revs, ~35K heads), discovery between 2 clones with
> 100 head missing on each side
> 
> before:
> ! wall 1.492111 comb 1.490000 user 1.450000 sys 0.040000 (best of 20)
> ! wall 1.813992 comb 1.820000 user 1.700000 sys 0.120000 (max of 20)
> ! wall 1.574326 comb 1.573500 user 1.522000 sys 0.051500 (avg of 20)
> ! wall 1.572583 comb 1.570000 user 1.520000 sys 0.050000 (median of 20)
> 
> after:
> ! wall 1.147834 comb 1.150000 user 1.090000 sys 0.060000 (best of 20)
> ! wall 1.449144 comb 1.450000 user 1.330000 sys 0.120000 (max of 20)
> ! wall 1.204618 comb 1.202500 user 1.146500 sys 0.056000 (avg of 20)
> ! wall 1.194407 comb 1.190000 user 1.140000 sys 0.050000 (median of 20)
> 
> 
> pypy (~100 heads, 317 heads) discovery between clones with only 42 common heads
> 
> before:
> ! wall 0.031653 comb 0.030000 user 0.030000 sys 0.000000 (best of 25)
> ! wall 0.055719 comb 0.050000 user 0.040000 sys 0.010000 (max of 25)
> ! wall 0.038939 comb 0.039600 user 0.038400 sys 0.001200 (avg of 25)
> ! wall 0.038660 comb 0.050000 user 0.040000 sys 0.010000 (median of 25)
> 
> after:
> ! wall 0.018754 comb 0.020000 user 0.020000 sys 0.000000 (best of 49)
> ! wall 0.034505 comb 0.040000 user 0.030000 sys 0.010000 (max of 49)
> ! wall 0.019631 comb 0.019796 user 0.018367 sys 0.001429 (avg of 49)
> ! wall 0.019132 comb 0.020000 user 0.020000 sys 0.000000 (median of 49)
> 
> 
> Private repository (~1M revs, ~3K heads), discovery from a strip subset, about
> 100 changesets to be pulled.
> 
> before:
> ! wall 1.837729 comb 1.840000 user 1.790000 sys 0.050000 (best of 20)
> ! wall 2.203468 comb 2.200000 user 2.100000 sys 0.100000 (max of 20)
> ! wall 2.049355 comb 2.048500 user 2.002500 sys 0.046000 (avg of 20)
> ! wall 2.035315 comb 2.040000 user 2.000000 sys 0.040000 (median of 20)
> 
> after:
> ! wall 0.136598 comb 0.130000 user 0.110000 sys 0.020000 (best of 20)
> ! wall 0.330519 comb 0.330000 user 0.260000 sys 0.070000 (max of 20)
> ! wall 0.157254 comb 0.155500 user 0.123000 sys 0.032500 (avg of 20)
> ! wall 0.149870 comb 0.140000 user 0.110000 sys 0.030000 (median of 20)
> 
> 
> Same private repo, discovery between two clone with 500 different heads on each
> side:
> 
> before:
> ! wall 2.372919 comb 2.370000 user 2.320000 sys 0.050000 (best of 20)
> ! wall 2.622422 comb 2.610000 user 2.510000 sys 0.100000 (max of 20)
> ! wall 2.450135 comb 2.450000 user 2.402000 sys 0.048000 (avg of 20)
> ! wall 2.443896 comb 2.450000 user 2.410000 sys 0.040000 (median of 20)
> 
> after:
> ! wall 0.625497 comb 0.620000 user 0.570000 sys 0.050000 (best of 20)
> ! wall 0.834723 comb 0.820000 user 0.730000 sys 0.090000 (max of 20)
> ! wall 0.675725 comb 0.675500 user 0.628000 sys 0.047500 (avg of 20)
> ! wall 0.671614 comb 0.680000 user 0.640000 sys 0.040000 (median of 20)
> 
> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -275,9 +275,58 @@ def findcommonheads(ui, local, remote,
>       # early exit if we know all the specified remote heads already
>       ui.debug("query 1; heads\n")
>       roundtrips += 1
> -    sample = _limitsample(ownheads, initialsamplesize)
> -    # indices between sample and externalized version must match
> -    sample = list(sample)
> +    # We also ask remote about all the local heads. That set can be arbitrarily
> +    # large, so we used to limit it size to `initialsamplesize`. We no longer
> +    # do as it proved counter productive. The skipped heads could lead to a
> +    # large "undecided" set, slower to be clarified than if we asked the
> +    # question for all heads right away.
> +    #
> +    # We are already fetching all server heads using the `heads` commands,
> +    # sending a equivalent number of heads the other way should not have a
> +    # significant impact.  In addition, it is very likely that we are going to
> +    # have to issue "known" request for an equivalent amount of revisions in
> +    # order to decide if theses heads are common or missing.
> +    #
> +    # find a detailled analysis below.
> +    #
> +    # Case A: local and server both has few heads
> +    #
> +    #     Ownheads is below initialsamplesize, limit would not have any effect.
> +    #
> +    # Case B: local has few heads and server has many
> +    #
> +    #     Ownheads is below initialsamplesize, limit would not have any effect.
> +    #
> +    # Case C: local and server both has many heads
> +    #
> +    #     We now transfert some more data, but not significantly more than is
> +    #     already transfered to carry the server heads.
> +    #
> +    # Case D: local has many heads, server has few
> +    #
> +    #   D.1 local heads are mostly known remotely
> +    #
> +    #     All the known head will have be part of a `known` request at some
> +    #     point for the discovery to finish. Sending them all earlier is
> +    #     actually helping.
> +    #
> +    #     (This case is fairly unlikely, it requires the numerous heads to all
> +    #     be merged server side in only a few heads)
> +    #
> +    #   D.2 local heads are mostly missing remotely
> +    #
> +    #     To determine that the heads are missing, we'll have to issue `known`
> +    #     request for them or one of their ancestors. This amount of `known`
> +    #     request will likely be in the same order of magnitude than the amount
> +    #     of local heads.
> +    #
> +    #     The only case where we can be more efficient using `known` request on
> +    #     ancestors are case were all the "missing" local heads are based on a
> +    #     few changeset, also "missing".  This means we would have a "complex"
> +    #     graph (with many heads) attached to, but very independant to a the
> +    #     "simple" graph on the server. This is a fairly usual case and have
> +    #     not been met in the wild so far.
> +    sample = ownheads
>   
>       with remote.commandexecutor() as e:
>           fheads = e.callcommand('heads', {})
> diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
> --- a/tests/test-setdiscovery.t
> +++ b/tests/test-setdiscovery.t
> @@ -926,7 +926,7 @@ Both many new on top of long history:
>     common heads: 7ead0cba2838
>   
>   
> -One with >200 heads, which used to use up all of the sample:
> +One with >200 heads. We now switch to send them all in the initial roundtrip, but still do sampling for the later request.
>   
>     $ hg init manyheads
>     $ cd manyheads
> @@ -974,20 +974,17 @@ One with >200 heads, which used to use u
>     searching for changes
>     taking quick initial sample
>     searching: 2 queries
> -  query 2; still undecided: 1240, sample size is: 100
> +  query 2; still undecided: 1080, sample size is: 100
>     sampling from both directions
>     searching: 3 queries
> -  query 3; still undecided: 1140, sample size is: 200
> +  query 3; still undecided: 980, sample size is: 200
>     sampling from both directions
>     searching: 4 queries
>     query 4; still undecided: \d+, sample size is: 200 (re)
>     sampling from both directions
>     searching: 5 queries
> -  query 5; still undecided: \d+, sample size is: 200 (re)
> -  sampling from both directions
> -  searching: 6 queries
> -  query 6; still undecided: \d+, sample size is: \d+ (re)
> -  6 total queries in *.????s (glob)
> +  query 5; still undecided: 195, sample size is: 195
> +  5 total queries in *.????s (glob)
>     elapsed time:  * seconds (glob)
>     heads summary:
>       total common heads:          1
> @@ -1116,6 +1113,6 @@ fixed in 86c35b7ae300:
>     $ hg -R r1 --config extensions.blackbox= blackbox --config blackbox.track=
>     * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> serve --cmdserver chgunix * (glob) (chg !)
>     * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> -R r1 outgoing r2 *-T{rev} * --config *extensions.blackbox=* (glob)
> -  * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> found 101 common and 1 unknown server heads, 2 roundtrips in *.????s (glob)
> +  * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> found 101 common and 1 unknown server heads, 1 roundtrips in *.????s (glob)
>     * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> -R r1 outgoing r2 *-T{rev} * --config *extensions.blackbox=* exited 0 after *.?? seconds (glob)
>     $ cd ..
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> 

-- 
Pierre-Yves David