[PATCH 2 of 2] setdiscovery: stop limiting the number of local head we initially send
Pierre-Yves David
pierre-yves.david at ens-lyon.org
Tue Apr 16 18:57:45 EDT 2019
(email sent on Pulkit Goyal suggestion)
It would help us (Octobus) if this patch can make it into 5.0.
This is a small code change that remove an embarrassing slow down in
case where they would expect Mercurial to be snappy. That kind of
discovery performance boosts make a significant difference on our side.
Cheers,
On 4/16/19 7:24 PM, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at octobus.net>
> # Date 1555428398 -7200
> # Tue Apr 16 17:26:38 2019 +0200
> # Node ID ffaa98def33a903f132ec4177d36823a741b6ef6
> # Parent 017778a4463a8e6ecb4b17cacf46a3ab27bdb239
> # EXP-Topic discovery-speedup
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> # hg pull https://bitbucket.org/octobus/mercurial-devel/ -r ffaa98def33a
> setdiscovery: stop limiting the number of local head we initially send
>
> In our testing this limitation provides now real gain and instead triggers
> pathological discovery timing for some repository with many heads.
>
> See inline documentation for details.
>
> Some timing below:
>
> Mozilla try repository, (~1M revs, ~35K heads), discovery between 2 clones with
> 100 head missing on each side
>
> before:
> ! wall 1.492111 comb 1.490000 user 1.450000 sys 0.040000 (best of 20)
> ! wall 1.813992 comb 1.820000 user 1.700000 sys 0.120000 (max of 20)
> ! wall 1.574326 comb 1.573500 user 1.522000 sys 0.051500 (avg of 20)
> ! wall 1.572583 comb 1.570000 user 1.520000 sys 0.050000 (median of 20)
>
> after:
> ! wall 1.147834 comb 1.150000 user 1.090000 sys 0.060000 (best of 20)
> ! wall 1.449144 comb 1.450000 user 1.330000 sys 0.120000 (max of 20)
> ! wall 1.204618 comb 1.202500 user 1.146500 sys 0.056000 (avg of 20)
> ! wall 1.194407 comb 1.190000 user 1.140000 sys 0.050000 (median of 20)
>
>
> pypy (~100 heads, 317 heads) discovery between clones with only 42 common heads
>
> before:
> ! wall 0.031653 comb 0.030000 user 0.030000 sys 0.000000 (best of 25)
> ! wall 0.055719 comb 0.050000 user 0.040000 sys 0.010000 (max of 25)
> ! wall 0.038939 comb 0.039600 user 0.038400 sys 0.001200 (avg of 25)
> ! wall 0.038660 comb 0.050000 user 0.040000 sys 0.010000 (median of 25)
>
> after:
> ! wall 0.018754 comb 0.020000 user 0.020000 sys 0.000000 (best of 49)
> ! wall 0.034505 comb 0.040000 user 0.030000 sys 0.010000 (max of 49)
> ! wall 0.019631 comb 0.019796 user 0.018367 sys 0.001429 (avg of 49)
> ! wall 0.019132 comb 0.020000 user 0.020000 sys 0.000000 (median of 49)
>
>
> Private repository (~1M revs, ~3K heads), discovery from a strip subset, about
> 100 changesets to be pulled.
>
> before:
> ! wall 1.837729 comb 1.840000 user 1.790000 sys 0.050000 (best of 20)
> ! wall 2.203468 comb 2.200000 user 2.100000 sys 0.100000 (max of 20)
> ! wall 2.049355 comb 2.048500 user 2.002500 sys 0.046000 (avg of 20)
> ! wall 2.035315 comb 2.040000 user 2.000000 sys 0.040000 (median of 20)
>
> after:
> ! wall 0.136598 comb 0.130000 user 0.110000 sys 0.020000 (best of 20)
> ! wall 0.330519 comb 0.330000 user 0.260000 sys 0.070000 (max of 20)
> ! wall 0.157254 comb 0.155500 user 0.123000 sys 0.032500 (avg of 20)
> ! wall 0.149870 comb 0.140000 user 0.110000 sys 0.030000 (median of 20)
>
>
> Same private repo, discovery between two clone with 500 different heads on each
> side:
>
> before:
> ! wall 2.372919 comb 2.370000 user 2.320000 sys 0.050000 (best of 20)
> ! wall 2.622422 comb 2.610000 user 2.510000 sys 0.100000 (max of 20)
> ! wall 2.450135 comb 2.450000 user 2.402000 sys 0.048000 (avg of 20)
> ! wall 2.443896 comb 2.450000 user 2.410000 sys 0.040000 (median of 20)
>
> after:
> ! wall 0.625497 comb 0.620000 user 0.570000 sys 0.050000 (best of 20)
> ! wall 0.834723 comb 0.820000 user 0.730000 sys 0.090000 (max of 20)
> ! wall 0.675725 comb 0.675500 user 0.628000 sys 0.047500 (avg of 20)
> ! wall 0.671614 comb 0.680000 user 0.640000 sys 0.040000 (median of 20)
>
> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -275,9 +275,58 @@ def findcommonheads(ui, local, remote,
> # early exit if we know all the specified remote heads already
> ui.debug("query 1; heads\n")
> roundtrips += 1
> - sample = _limitsample(ownheads, initialsamplesize)
> - # indices between sample and externalized version must match
> - sample = list(sample)
> + # We also ask remote about all the local heads. That set can be arbitrarily
> + # large, so we used to limit it size to `initialsamplesize`. We no longer
> + # do as it proved counter productive. The skipped heads could lead to a
> + # large "undecided" set, slower to be clarified than if we asked the
> + # question for all heads right away.
> + #
> + # We are already fetching all server heads using the `heads` commands,
> + # sending a equivalent number of heads the other way should not have a
> + # significant impact. In addition, it is very likely that we are going to
> + # have to issue "known" request for an equivalent amount of revisions in
> + # order to decide if theses heads are common or missing.
> + #
> + # find a detailled analysis below.
> + #
> + # Case A: local and server both has few heads
> + #
> + # Ownheads is below initialsamplesize, limit would not have any effect.
> + #
> + # Case B: local has few heads and server has many
> + #
> + # Ownheads is below initialsamplesize, limit would not have any effect.
> + #
> + # Case C: local and server both has many heads
> + #
> + # We now transfert some more data, but not significantly more than is
> + # already transfered to carry the server heads.
> + #
> + # Case D: local has many heads, server has few
> + #
> + # D.1 local heads are mostly known remotely
> + #
> + # All the known head will have be part of a `known` request at some
> + # point for the discovery to finish. Sending them all earlier is
> + # actually helping.
> + #
> + # (This case is fairly unlikely, it requires the numerous heads to all
> + # be merged server side in only a few heads)
> + #
> + # D.2 local heads are mostly missing remotely
> + #
> + # To determine that the heads are missing, we'll have to issue `known`
> + # request for them or one of their ancestors. This amount of `known`
> + # request will likely be in the same order of magnitude than the amount
> + # of local heads.
> + #
> + # The only case where we can be more efficient using `known` request on
> + # ancestors are case were all the "missing" local heads are based on a
> + # few changeset, also "missing". This means we would have a "complex"
> + # graph (with many heads) attached to, but very independant to a the
> + # "simple" graph on the server. This is a fairly usual case and have
> + # not been met in the wild so far.
> + sample = ownheads
>
> with remote.commandexecutor() as e:
> fheads = e.callcommand('heads', {})
> diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
> --- a/tests/test-setdiscovery.t
> +++ b/tests/test-setdiscovery.t
> @@ -926,7 +926,7 @@ Both many new on top of long history:
> common heads: 7ead0cba2838
>
>
> -One with >200 heads, which used to use up all of the sample:
> +One with >200 heads. We now switch to send them all in the initial roundtrip, but still do sampling for the later request.
>
> $ hg init manyheads
> $ cd manyheads
> @@ -974,20 +974,17 @@ One with >200 heads, which used to use u
> searching for changes
> taking quick initial sample
> searching: 2 queries
> - query 2; still undecided: 1240, sample size is: 100
> + query 2; still undecided: 1080, sample size is: 100
> sampling from both directions
> searching: 3 queries
> - query 3; still undecided: 1140, sample size is: 200
> + query 3; still undecided: 980, sample size is: 200
> sampling from both directions
> searching: 4 queries
> query 4; still undecided: \d+, sample size is: 200 (re)
> sampling from both directions
> searching: 5 queries
> - query 5; still undecided: \d+, sample size is: 200 (re)
> - sampling from both directions
> - searching: 6 queries
> - query 6; still undecided: \d+, sample size is: \d+ (re)
> - 6 total queries in *.????s (glob)
> + query 5; still undecided: 195, sample size is: 195
> + 5 total queries in *.????s (glob)
> elapsed time: * seconds (glob)
> heads summary:
> total common heads: 1
> @@ -1116,6 +1113,6 @@ fixed in 86c35b7ae300:
> $ hg -R r1 --config extensions.blackbox= blackbox --config blackbox.track=
> * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> serve --cmdserver chgunix * (glob) (chg !)
> * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> -R r1 outgoing r2 *-T{rev} * --config *extensions.blackbox=* (glob)
> - * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> found 101 common and 1 unknown server heads, 2 roundtrips in *.????s (glob)
> + * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> found 101 common and 1 unknown server heads, 1 roundtrips in *.????s (glob)
> * @5d0b986a083e0d91f116de4691e2aaa54d5bbec0 (*)> -R r1 outgoing r2 *-T{rev} * --config *extensions.blackbox=* exited 0 after *.?? seconds (glob)
> $ cd ..
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
--
Pierre-Yves David
More information about the Mercurial-devel
mailing list