Development of a performance tracking tool for Mercurial

Philippe Pepiot philippe.pepiot at logilab.fr
Wed Apr 13 03:38:06 EDT 2016


On 04/13/2016 01:18 AM, Matt Mackall wrote:
> On Tue, 2016-04-12 at 12:03 +0200, Philippe Pepiot wrote:
>> Hello,
>>
>> I published a new demo with parametrized benchmarks:
>> https://hg.logilab.org/review/hgperf/raw-file/685dfc2bbe87/html/index.html
> This regression looks interesting:
>
> https://hg.logilab.org/review/hgperf/raw-file/685dfc2bbe87/html/index.html#other
> s.time_tags?branch=default&x=28265&idx=2
>
> Am I right in thinking this is the time it takes to run "hg tags" against the
> Mozilla repo?

A part of this time yes, the benchmarked code is 
https://hg.logilab.org/review/hgperf/file/da745dae4dd1/benchmarks/others.py#l8 
(same code from perftags in contrib/perf.py)

>
>> The code used to run benchmarks was:
>> https://hg.logilab.org/review/hgperf/file/da745dae4dd1 (See README.rst)
>>
>> All benchmarks where run against three reference repository (hg, pypy
>> and mozilla-central), and revsets are parametrized with variants
>> (sort(), first(), last(), etc).
>>
>> Our remarks after analyzing this demo:
>>
>> - As expected, having multiple reference repositories give more
>> informations about regressions and improvements (for example the
>> improvement from  d2ac8b57a for the revset last(roots((0::) - (0::tip)))
>> is only visible on mozilla repo).
>> - Colors that are changing when (de)selecting parameters are very
>> disturbing, need to fix that.
>> - Maybe we could select log scale by default ?
>> - Handle all parameters in url to make them shareable.
>>
>> After seeing this demo, having revset variants on the same page is
>> relevant for you ?
>>
>> Now I've a question about writing and maintaining benchmark code as we
>> have multiple choices here:
>>
>> 1) Use mercurial internal API (benefits: unlimited possibilities without
>> modifying mercurial and we can write backward compatible benchmark with
>> some 'if' statements and benchmarks older versions, profits all ASV
>> features (profiling, memory benchmarks etc). Drawbacks: duplicate code
>> with contrib/perf.py, will break on internal API changes, need more
>> maintenance and more code to write/keep backward compatible benchmarks).
>>
>> 2) Use contrib/perf.py extension from the benchmarked version of
>> mercurial (benefits: de facto backward compatible, drawbacks: limited to
>> what the tool can do in previous versions)
>>
>> 3) Use contrib/perf.py extension from the latest version of mercurial
>> (benefits: no duplicate code, easier maintenance, new benchmarks profits
>> to both tools. Drawbacks: not backward compatible for now (it works only
>> for >= 3.7 versions)). We could also implement some glue code, either in
>> the tracking tool or in contrib/perf.py, to list available benchmarks
>> and theirs parameters.
>>
>> At this stage of the project my advice it to use 1), but we could also
>> have a mix of 1) and 3). It depend on how fast are internal api changes
>> and on your short/mid/long term objectives on the level of integration
>> of the tracking tool.
>>
>> What do you think ?
>>
>> On 04/04/2016 01:41 PM, Philippe Pepiot wrote:
>>> Hello,
>>>
>>> We (people at Logilab and Pierre-Yves) have a discussion last friday
>>> about the performance tracking tool, here is the summary:
>>>
>>> - The choice have been made to use ASV http://asv.readthedocs.org/ as
>>> it appear to us to be the more complete tool. ASV will be enhanced to
>>> fit our needs, at least fixing hg branches handling, revision instead
>>> of date for X axis, add a notification system and having a better home
>>> page.
>>> - We will setup a new buildbot job in the existing mercurial
>>> infrastructure and provide an online version of the performance
>>> tracking tool that is continuously updated when changes are pushed in
>>> the mercurial repository.
>>> - We discussed about parametrized benchmark that could be displayed on
>>> the same graph (multiple reference repositories and revset variants).
>>> ASV has this feature
>>> (http://asv.readthedocs.org/en/latest/writing_benchmarks.html#parameterized-
>>> benchmarks),
>>> we will experiment it.
>>> - We discussed about tracking improvement too, a change can have
>>> positive or negative impact on multiple benchmarks (especially on
>>> revsets benchmarks), having a global view of this information could be
>>> a good feature.
>>> - We planed a possible sprint on the topic in May 2016 either in Paris
>>> or London
>>> - The wiki page
>>> https://www.mercurial-scm.org/wiki/PerformanceTrackingSuitePlan need
>>> to be updated and completed to reflect the current state of the topic.
>>>
>>>
>>> On 03/31/2016 10:24 AM, Philippe Pepiot wrote:
>>>> Hello,
>>>>
>>>> Beside my replies bellow, a new demo of ASV with more bench values is
>>>> available at
>>>> https://hg.logilab.org/review/hgperf/raw-file/454c2bd71fa4/index.html#/reg
>>>> ressions
>>>> (this was tested against the pypy repository located at
>>>> https://bitbucket.org/pypy/pypy)
>>>>
>>>> The results database can be seen in
>>>> https://hg.logilab.org/review/hgperf/file/454c2bd71fa4/results
>>>>
>>>> On 03/30/2016 12:21 AM, Pierre-Yves David wrote:
>>>>>> - How do we manage the non-linear structure of a Mercurial history?
>>>>> That's a fun question. The Mercurial repository is mostly linear as
>>>>> long as only one branch is concerned. However:
>>>>>
>>>>>   - We don't (and have no reason to) enforce it,
>>>>>   - the big picture with multiple branches part is still non-linear.
>>>>>
>>>> The solution proposed in ASV is have a graph per branch and to only
>>>> follow the first parent of a merge (to avoid unwanted up and down
>>>> that disturb regression detection), this is what I've done in the
>>>> demo. The revset used to build the default branch graph is hg log
>>>> --follow-first -r 'sort(ancestors(default), -rev)'
>>>>
>>>> The drawback is that we cannot always detect precisely the particular
>>>> changeset which introduce the regression if it occur on a merge
>>>> changeset (but we can give a range here).
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Airspeed velocity
>>>>>> ~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> - http://asv.readthedocs.org/
>>>>>> - used by the http://www.astropy.org/ project and inspired by
>>>>>> https://github.com/pydata/vbench
>>>>>> - Code: https://github.com/spacetelescope/asv
>>>>>> - Presentation (2014): https://www.youtube.com/watch?v=OsxJ5O6h8s0
>>>>>> - Python, Javascript (http://www.flotcharts.org/)
>>>>>>
>>>>>>
>>>>>> This tool aims at benchmarking Python packages over their lifetime.
>>>>>> It is mainly a command line tool, ``asv``, that run a series of
>>>>>> benchmarks (described
>>>>>> in JSON configuration file), and produces a static HTML/JS report.
>>>>>>
>>>>>> When running a benchmark suite, ``asv`` take care of clone/pulling
>>>>>> the source repository
>>>>>> in a virtual env and running the configured tasks in this virtual env.
>>>>>>
>>>>>> Results of each benchmark execution are stored in a "database"
>>>>>> (consisting in
>>>>>> JSON files). This database is used to produce evolution plots of
>>>>>> the time required
>>>>>> to run a test (or any metrics; out of the box, asv has support for
>>>>>> 4 types of benchmark:
>>>>>> timing, memory, peak memory and tracking), and to run the
>>>>>> regression detection algorithms.
>>>>>>
>>>>>> One key feature of this tool is that it's very easy for every
>>>>>> developer to use it on
>>>>>> its own development environment. For example, it provides an ``asv
>>>>>> compare`` command allowing to compare
>>>>>> the results of any 2 revisions.
>>>>>>
>>>>>> However, asv will require some work to fit the needs:
>>>>>>
>>>>>> - The main drawback with asv is the fact it's designed with commit
>>>>>> date as X axis.
>>>>>> We must adapt the code of asv to properly handle this
>>>>>> "non-linearity" related to
>>>>>> dates (see https://github.com/spacetelescope/asv/issues/390)
>>>>>> - Tags are displayed in the graphs as a secondary x axis labels,
>>>>>> and are related to commit
>>>>>> date of the tag; these should be displayed as annotations of the
>>>>>> dots instead.
>>>>>>
>>>>>>
>>>>>> :Pros:
>>>>>>
>>>>>> - Complete and cover most of our needs (and more)
>>>>>> - Handle mercurial repositories
>>>>>> - Generate static website with dashboard, interactive graphs
>>>>>> - Detect regressions, implement step detection algorithms:
>>>>>> http://asv.readthedocs.org/en/latest/dev.html#module-asv.step_detect
>>>>>> - Parametrized benchmarks
>>>>>> - Can collect metrics from multiple machines
>>>>>> - Show tags on the graph, link to commits
>>>>>> - Framework to write time, memory or custom benchmarks
>>>>>> - Facilities to run benchmarks (run against a revset, compute only
>>>>>> missing values etc)
>>>>>> - Can be used easily on the developer side as well (before
>>>>>> submitting patches)
>>>>>> - Seems extensible easily through a plugin system
>>>>>>
>>>>>> :Cons:
>>>>>>
>>>>>> - No email notifications
>>>>>> - Need to plot the graph by revision number instead of commit date
>>>>>> - The graph per branch need to be fixed for mercurial
>>>>> This one seems pretty solid and I like the idea of being able to run
>>>>> it locally.
>>>>>
>>>>> The dashboard seems a bit too simple to me, and I'm a bit worried
>>>>> here. the branch part is another unknown.
>>>>>
>>>>> How hard would be to implement a notification system on top of that.
>>>> I agree the home page with summary graphs seems useless, the
>>>> regression page could be a better entry point.
>>>>
>>>> To implement a notification system we could track modifications of
>>>> the file "regression.json" which is generated when the static site is
>>>> built (asv publish). At this point my idea is to keep history of the
>>>> static site in a dedicated repository and generate a rss/atom page by
>>>> looking at the history of the regression.json file and then we can
>>>> plug any external tool that produce notification from the feed (irc,
>>>> mail etc). Another idea could be to have a mercurial hook that does
>>>> the same thing.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> EzBench
>>>>>> ~~~~~~~
>>>>>>
>>>>>> - Code: https://cgit.freedesktop.org/ezbench
>>>>>> - Used to benchmark graphics related patch on the Linux kernel.
>>>>>> - Slides:
>>>>>> https://fosdem.org/2016/schedule/event/ezbench/attachments/slides/1168
>>>>>> /export/events/attachments/ezbench/slides/1168/fosdem16_martin_peres_e
>>>>>> zbench.pdf
>>>>>> - Shell scripts
>>>>>>
>>>>>> EzBench (https://cgit.freedesktop.org/ezbench) is a collection of
>>>>>> tools to benchmark
>>>>>> graphics-related patchsets on the Linux kernel. It runs the
>>>>>> benchmark suite on a particular
>>>>>> commit and store the results as csv files. It has tools to read the
>>>>>> results and generate static
>>>>>> html reports. It can also automate the bisect process to find the
>>>>>> commit who introduced the
>>>>>> regression. It's written in shell and python and is highly coupled
>>>>>> to its purpose.
>>>>>>
>>>>>> :Pros:
>>>>>>
>>>>>> - Generate reports
>>>>>> - Bisects performance changes automatically and confirm a detected
>>>>>> regression by reproducing it
>>>>>> - Reducing variance tips, capture all benchmark data (hardware,
>>>>>> libraries, versions)
>>>>>>
>>>>>> :Cons:
>>>>>>
>>>>>> - Not usable as it
>>>>>> - Doesn't handle mercurial repositories
>>>>> It is unclear to me what's make it not usable as is (beside the lack
>>>>> of Mercurial support?)
>>>> Well, It seem we have to write bunch of shell code to create a
>>>> "profile" and "tests", there is a kind of common library to write
>>>> these files but its only about graphics stuff. All I was able to have
>>>> is a temporary black screen :)
>>>>
>>>>

-- 
Philippe Pepiot


More information about the Mercurial-devel mailing list