[PATCH 0 of 1] diffstat implementation in python
Alexander Solovyov
piranha at piranha.org.ua
Sun Dec 21 03:54:29 CST 2008
This is python implementation of diffstat, which makes it work much faster
on small patches (no overhead of calling a system program) and removes
dependency on 'diffstat' program. Additionally it is possible to retrieve
collected information through an attributes of a diffstat object, which
removes a need to parse diffstat output if anyone will need it.
I have compared its performance to the performance of old patch.diffstat:
===================================
>>> w = file('smalldiff').readlines()
>>> len(w)
77
>>> %timeit patch.diffstat(w)
10 loops, best of 3: 75.6 ms per loop
>>> %timeit str(diffstat(w))
10000 loops, best of 3: 97.4 µs per loop
>>> %timeit diffstat(w)
10000 loops, best of 3: 86.4 µs per loop
>>> q = file('diff').readlines()
>>> len(q)
119468
>>> %timeit patch.diffstat(q)
10 loops, best of 3: 128 ms per loop
>>> %timeit str(diffstat(q))
10 loops, best of 3: 119 ms per loop
>>> a = readfiles('broken-out')
>>> len(a)
307687
>>> from mercurial import patch
>>> %timeit patch.diffstat(a)
10 loops, best of 3: 215 ms per loop
>>> from diffstat import diffstat
>>> %timeit str(diffstat(a))
10 loops, best of 3: 321 ms per loop
===================================
As system diffstat works better than a python's on large volumes of data,
maybe it is worth it to check length of an input and switch to calling
'diffstat' command in case if input is large enough. Maybe this limit should
be configurable.
I personally think that such large volumes are not that usual thing and it's
safe to leave only python version.
More information about the Mercurial-devel
mailing list