Development of a performance tracking tool for Mercurial

Tue Apr 12 19:18:52 EDT 2016

On Tue, 2016-04-12 at 12:03 +0200, Philippe Pepiot wrote:
> Hello,
> 
> I published a new demo with parametrized benchmarks: 
> https://hg.logilab.org/review/hgperf/raw-file/685dfc2bbe87/html/index.html

This regression looks interesting:

https://hg.logilab.org/review/hgperf/raw-file/685dfc2bbe87/html/index.html#other
s.time_tags?branch=default&x=28265&idx=2

Am I right in thinking this is the time it takes to run "hg tags" against the
Mozilla repo?

> The code used to run benchmarks was: 
> https://hg.logilab.org/review/hgperf/file/da745dae4dd1 (See README.rst)
> 
> All benchmarks where run against three reference repository (hg, pypy 
> and mozilla-central), and revsets are parametrized with variants 
> (sort(), first(), last(), etc).
> 
> Our remarks after analyzing this demo:
> 
> - As expected, having multiple reference repositories give more 
> informations about regressions and improvements (for example the 
> improvement from  d2ac8b57a for the revset last(roots((0::) - (0::tip))) 
> is only visible on mozilla repo).
> - Colors that are changing when (de)selecting parameters are very 
> disturbing, need to fix that.
> - Maybe we could select log scale by default ?
> - Handle all parameters in url to make them shareable.
> 
> After seeing this demo, having revset variants on the same page is 
> relevant for you ?
> 
> Now I've a question about writing and maintaining benchmark code as we 
> have multiple choices here:
> 
> 1) Use mercurial internal API (benefits: unlimited possibilities without 
> modifying mercurial and we can write backward compatible benchmark with 
> some 'if' statements and benchmarks older versions, profits all ASV 
> features (profiling, memory benchmarks etc). Drawbacks: duplicate code 
> with contrib/perf.py, will break on internal API changes, need more 
> maintenance and more code to write/keep backward compatible benchmarks).
> 
> 2) Use contrib/perf.py extension from the benchmarked version of 
> mercurial (benefits: de facto backward compatible, drawbacks: limited to 
> what the tool can do in previous versions)
> 
> 3) Use contrib/perf.py extension from the latest version of mercurial 
> (benefits: no duplicate code, easier maintenance, new benchmarks profits 
> to both tools. Drawbacks: not backward compatible for now (it works only 
> for >= 3.7 versions)). We could also implement some glue code, either in 
> the tracking tool or in contrib/perf.py, to list available benchmarks 
> and theirs parameters.
> 
> At this stage of the project my advice it to use 1), but we could also 
> have a mix of 1) and 3). It depend on how fast are internal api changes 
> and on your short/mid/long term objectives on the level of integration 
> of the tracking tool.
> 
> What do you think ?
> 
> On 04/04/2016 01:41 PM, Philippe Pepiot wrote:
> > 
> > Hello,
> > 
> > We (people at Logilab and Pierre-Yves) have a discussion last friday 
> > about the performance tracking tool, here is the summary:
> > 
> > - The choice have been made to use ASV http://asv.readthedocs.org/ as 
> > it appear to us to be the more complete tool. ASV will be enhanced to 
> > fit our needs, at least fixing hg branches handling, revision instead 
> > of date for X axis, add a notification system and having a better home 
> > page.
> > - We will setup a new buildbot job in the existing mercurial 
> > infrastructure and provide an online version of the performance 
> > tracking tool that is continuously updated when changes are pushed in 
> > the mercurial repository.
> > - We discussed about parametrized benchmark that could be displayed on 
> > the same graph (multiple reference repositories and revset variants). 
> > ASV has this feature 
> > (http://asv.readthedocs.org/en/latest/writing_benchmarks.html#parameterized-
> > benchmarks), 
> > we will experiment it.
> > - We discussed about tracking improvement too, a change can have 
> > positive or negative impact on multiple benchmarks (especially on 
> > revsets benchmarks), having a global view of this information could be 
> > a good feature.
> > - We planed a possible sprint on the topic in May 2016 either in Paris 
> > or London
> > - The wiki page 
> > https://www.mercurial-scm.org/wiki/PerformanceTrackingSuitePlan need 
> > to be updated and completed to reflect the current state of the topic.
> > 
> > 
> > On 03/31/2016 10:24 AM, Philippe Pepiot wrote:
> > > 
> > > Hello,
> > > 
> > > Beside my replies bellow, a new demo of ASV with more bench values is 
> > > available at 
> > > https://hg.logilab.org/review/hgperf/raw-file/454c2bd71fa4/index.html#/reg
> > > ressions 
> > > (this was tested against the pypy repository located at 
> > > https://bitbucket.org/pypy/pypy)
> > > 
> > > The results database can be seen in 
> > > https://hg.logilab.org/review/hgperf/file/454c2bd71fa4/results
> > > 
> > > On 03/30/2016 12:21 AM, Pierre-Yves David wrote:
> > > > 
> > > > > 
> > > > > - How do we manage the non-linear structure of a Mercurial history?
> > > > That's a fun question. The Mercurial repository is mostly linear as 
> > > > long as only one branch is concerned. However:
> > > > 
> > > >  - We don't (and have no reason to) enforce it,
> > > >  - the big picture with multiple branches part is still non-linear.
> > > > 
> > > The solution proposed in ASV is have a graph per branch and to only 
> > > follow the first parent of a merge (to avoid unwanted up and down 
> > > that disturb regression detection), this is what I've done in the 
> > > demo. The revset used to build the default branch graph is hg log 
> > > --follow-first -r 'sort(ancestors(default), -rev)'
> > > 
> > > The drawback is that we cannot always detect precisely the particular 
> > > changeset which introduce the regression if it occur on a merge 
> > > changeset (but we can give a range here).
> > > 
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > Airspeed velocity
> > > > > ~~~~~~~~~~~~~~~~~
> > > > > 
> > > > > - http://asv.readthedocs.org/
> > > > > - used by the http://www.astropy.org/ project and inspired by 
> > > > > https://github.com/pydata/vbench
> > > > > - Code: https://github.com/spacetelescope/asv
> > > > > - Presentation (2014): https://www.youtube.com/watch?v=OsxJ5O6h8s0
> > > > > - Python, Javascript (http://www.flotcharts.org/)
> > > > > 
> > > > > 
> > > > > This tool aims at benchmarking Python packages over their lifetime.
> > > > > It is mainly a command line tool, ``asv``, that run a series of 
> > > > > benchmarks (described
> > > > > in JSON configuration file), and produces a static HTML/JS report.
> > > > > 
> > > > > When running a benchmark suite, ``asv`` take care of clone/pulling 
> > > > > the source repository
> > > > > in a virtual env and running the configured tasks in this virtual env.
> > > > > 
> > > > > Results of each benchmark execution are stored in a "database" 
> > > > > (consisting in
> > > > > JSON files). This database is used to produce evolution plots of 
> > > > > the time required
> > > > > to run a test (or any metrics; out of the box, asv has support for 
> > > > > 4 types of benchmark:
> > > > > timing, memory, peak memory and tracking), and to run the 
> > > > > regression detection algorithms.
> > > > > 
> > > > > One key feature of this tool is that it's very easy for every 
> > > > > developer to use it on
> > > > > its own development environment. For example, it provides an ``asv 
> > > > > compare`` command allowing to compare
> > > > > the results of any 2 revisions.
> > > > > 
> > > > > However, asv will require some work to fit the needs:
> > > > > 
> > > > > - The main drawback with asv is the fact it's designed with commit 
> > > > > date as X axis.
> > > > > We must adapt the code of asv to properly handle this 
> > > > > "non-linearity" related to
> > > > > dates (see https://github.com/spacetelescope/asv/issues/390)
> > > > > - Tags are displayed in the graphs as a secondary x axis labels, 
> > > > > and are related to commit
> > > > > date of the tag; these should be displayed as annotations of the 
> > > > > dots instead.
> > > > > 
> > > > > 
> > > > > :Pros:
> > > > > 
> > > > > - Complete and cover most of our needs (and more)
> > > > > - Handle mercurial repositories
> > > > > - Generate static website with dashboard, interactive graphs
> > > > > - Detect regressions, implement step detection algorithms: 
> > > > > http://asv.readthedocs.org/en/latest/dev.html#module-asv.step_detect
> > > > > - Parametrized benchmarks
> > > > > - Can collect metrics from multiple machines
> > > > > - Show tags on the graph, link to commits
> > > > > - Framework to write time, memory or custom benchmarks
> > > > > - Facilities to run benchmarks (run against a revset, compute only 
> > > > > missing values etc)
> > > > > - Can be used easily on the developer side as well (before 
> > > > > submitting patches)
> > > > > - Seems extensible easily through a plugin system
> > > > > 
> > > > > :Cons:
> > > > > 
> > > > > - No email notifications
> > > > > - Need to plot the graph by revision number instead of commit date
> > > > > - The graph per branch need to be fixed for mercurial
> > > > This one seems pretty solid and I like the idea of being able to run 
> > > > it locally.
> > > > 
> > > > The dashboard seems a bit too simple to me, and I'm a bit worried 
> > > > here. the branch part is another unknown.
> > > > 
> > > > How hard would be to implement a notification system on top of that.
> > > I agree the home page with summary graphs seems useless, the 
> > > regression page could be a better entry point.
> > > 
> > > To implement a notification system we could track modifications of 
> > > the file "regression.json" which is generated when the static site is 
> > > built (asv publish). At this point my idea is to keep history of the 
> > > static site in a dedicated repository and generate a rss/atom page by 
> > > looking at the history of the regression.json file and then we can 
> > > plug any external tool that produce notification from the feed (irc, 
> > > mail etc). Another idea could be to have a mercurial hook that does 
> > > the same thing.
> > > 
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > EzBench
> > > > > ~~~~~~~
> > > > > 
> > > > > - Code: https://cgit.freedesktop.org/ezbench
> > > > > - Used to benchmark graphics related patch on the Linux kernel.
> > > > > - Slides: 
> > > > > https://fosdem.org/2016/schedule/event/ezbench/attachments/slides/1168
> > > > > /export/events/attachments/ezbench/slides/1168/fosdem16_martin_peres_e
> > > > > zbench.pdf
> > > > > - Shell scripts
> > > > > 
> > > > > EzBench (https://cgit.freedesktop.org/ezbench) is a collection of 
> > > > > tools to benchmark
> > > > > graphics-related patchsets on the Linux kernel. It runs the 
> > > > > benchmark suite on a particular
> > > > > commit and store the results as csv files. It has tools to read the 
> > > > > results and generate static
> > > > > html reports. It can also automate the bisect process to find the 
> > > > > commit who introduced the
> > > > > regression. It's written in shell and python and is highly coupled 
> > > > > to its purpose.
> > > > > 
> > > > > :Pros:
> > > > > 
> > > > > - Generate reports
> > > > > - Bisects performance changes automatically and confirm a detected 
> > > > > regression by reproducing it
> > > > > - Reducing variance tips, capture all benchmark data (hardware, 
> > > > > libraries, versions)
> > > > > 
> > > > > :Cons:
> > > > > 
> > > > > - Not usable as it
> > > > > - Doesn't handle mercurial repositories
> > > > It is unclear to me what's make it not usable as is (beside the lack 
> > > > of Mercurial support?)
> > > Well, It seem we have to write bunch of shell code to create a 
> > > "profile" and "tests", there is a kind of common library to write 
> > > these files but its only about graphics stuff. All I was able to have 
> > > is a temporary black screen :)
> > > 
> > > 
-- 
Mathematics is the supreme nostalgia of our time.