Our automatic tooling has detected a regression that impact both bundle http://perf.octobus.net/#basic_commands.BundleTimeTestSuite.time_bundle?branch=default&commits=4971c972-98995b68&commits=bce1c1af-41fcdfe3&p-repo='netbeans-2018-08-01'&p-changesets=1000 and incoming with http or ssh http://perf.octobus.net/#basic_commands.ExchangeTimeSuite.time_incoming?branch=default&commits=bce1c1af-41fcdfe3&p-repo='netbeans-2018-08-01'&p-repo_type='ssh'&p-strip='last-thousand'&p-revset=None and push http://perf.octobus.net/#basic_commands.PushPullTimeSuite.time_push?branch=default&commits=bce1c1af-41fcdfe3&p-repo='netbeans-2018-08-01'&p-repo_type='local'&p-strip='last-thousand'&p-revset=None for the netbeans repository. Bisection have identified https://www.mercurial-scm.org/repo/hg/rev/db5501d9 as the main responsible for the regression. Other repositories doesn't seems impacted. The regression has been reproduced locally.
The commit message of https://www.mercurial-scm.org/repo/hg/rev/db5501d9 is very detailed about the expected performance implications, including regressions in some cases. What concerns me most is the degree of perf regression. e.g. on http://perf.octobus.net/#basic_commands.PushPullTimeSuite.time_push?branch=default&commits=bce1c1af-41fcdfe3&p-repo='netbeans-2018-08-01'&p-repo_type='local'&p-strip='last-thousand'&p-revset=None we go from ~1.65s to ~12.0s, over a 6x increase! I would *really* like to know where those extra ~10s of CPU are being spent. I'm assuming it is on the producer end, as my measurements showed that the DAG ordered changegroups were *faster* to apply, not slower. But with e.g. the test measuring push performance, it would be really nice to confirm that. I know the Netbeans repo has a lot of merges and it is possible the non-linear "shape" of it is causing problems for various algorithms - possibly even the DAG sorting itself. I'd love to see `hg --profile` output from before and after showing which functions the regression is in...
If this is pinned on DAG sorting being the slowdown, we should definitely get a `hg perfdagsort` or something to isolate that operation.
The repository contains everything needed to reproduce those issues: https://bitbucket.org/octobus/bighgperf/src/default/ You can use the exact same repository that we use for the test using the make file: https://bitbucket.org/octobus/bighgperf/src/b8cb829cb993f2081983127d01e705b1733847f2/repos.make#lines-27. You can also download the repository using this command line: curl https://static.octobus.net/asv/netbeans-2018-08-01-reference.tar | tar x; hg -R netbeans-2018-08-01-reference update tip The easiest way to reproduce is trying to launch a bundle operation: HGRCPATH= hg bundle --config profiling.time_track=real --base ":-1000" /tmp/bundle.bundle --profile I did the before and after profiles of the bundle command, I will attach them to the issue. I've seen the performance in the commit message and I launched a full bundle during the night (with `hg bundle -t none-v2 -a`). It took 2.31h to finish. I've unbundle it in a new repository and the performance is back to baseline. Here are the number for reference: - 7.18s before - 37.77s after - 7.53s after conversion We're still checking the impact on some of our customers clients, what would be the possible options if we don't want to ship this performance regression in 4.8?
Created attachment 2022 [details] Profile before the regression
Created attachment 2023 [details] Profile after the regression
Created attachment 2024 [details] Profile after the regression and after the conversion
The regression impacts both push and pull on the netbeans repository: Pushing 100 revisions is 20% slower (+~200ms). Pulling 100 revisions is ~20% slower (+~500ms). Pushing 1000 revisions is 600% slower (+~10s). Pulling 1000 revisions is ~640% slower (+~13s). We have identified two issues: - The base rev selection algorithm is less good (~40% of the regression) - We don't linearize the emitted revisions by default anymore (~40% of the regression) Due to the high impact of the regression, I would like to propose tagging this issue as a release blocker.
(marking as confirmed because I have no reason to distrust this detailed analysis - I have not done my own investigation)
Fixed by https://mercurial-scm.org/repo/hg/rev/634b45317459 Boris Feld <boris.feld@octobus.net> changegroup: restore default node ordering (issue6001) Changeset db5501d9 changed the default node ordering from "storage" to "linearize". While the new API is more explicit and cleaner, the "linearize" order is problematic on certain repositories like netbeans where it makes bundling slower the more nodes we bundle. Pushing and pulling 100 changesets was ~20% slower and pushing and pulling 1000 changesets was ~600% slower. A very quick analysis of profile traces showed that the pull operation was taking more time creating the delta. Putting back the old default order seems to be the safe option. With more time during the next cycle, we can understand better the impact of sorting with the DAG order by default, the source of the regression and how to mitigate it. /!\ We are still waiting for the full performance impact but with this patch, bundling and pulling locally (not on the performance workstation) 1000 changesets on the netbeans repository is as fast as before the regression. Differential Revision: https://phab.mercurial-scm.org/D5196 (please test the fix)
Bug was set to TESTING for 7 days, resolving