New command: hg debugrevlog
Sune Foldager
cryo at cyanite.org
Thu May 12 13:02:46 CDT 2011
On 2011-05-12 12:13, Matt Mackall wrote:
>On Thu, 2011-05-12 at 18:44 +0200, Sune Foldager wrote:
>> I added a debugrevlog command which shows some data about a revlog (or filelog),
>> and some statistics. My original use case was to debug generaldelta performance.
>>
>> The command shows extensive statistics about the deltas if it's a generaldelta
>> revlog. Example output:
>>
>> cryo at serene:~/test$ hg debugrevlog .hg/store/00manifest.i
>> format : 1
>> flags : generaldelta
>> revisions : 10194
>> merges : 3374
>> chains : 16
>
>Does chains implicitly tell us the number of full revisions?
>Generaldelta allows 'chains' that are tree-shaped, so this is perhaps
>not the ideal name.
Yeah, no this is what it is; the name is misleading, it should be chain
bases instead (or snapshots). Counting chains when they are branchy isn't
really well-defined :-p. I'll change it.
>I'd like to see how much of a revlog is taken up by full revisions vs
>deltas.
Yes. So in this example, 16 out of 10194 revisions are full revisions.
I'll perhaps throw in some more percentages.
>
>> data size (min/max/avg) : 392850 / 611167 / 491670
>> compressed snapshot size (min/max/avg) : 113171 / 151388 / 135161
>> compressed delta size (min/max/avg) : 0 / 112440 / 1300
>
>>>> a = 16 * 135161
>>>> b = (10194 - 16) * 1300
>>>> 1.0 * a / (a + b)
>0.14048196515312222
>
>So approximately 14% of your revlog is full revisions, modulo rounding
>error.
Yeah, good idea. I'll throw those stats in as well. This is gonna be the
mother of all stat dumps.
>I'd also like to see total size and effective compression ratio.
Me too; how do we do that efficiently? I can't access the uncompressed size
of deltas, since that field holds the uncompressed size of the revision
instead.
>
>>>> a + b
>15393976 # 15MB compressed
>
>>>> c = 491670 * 10194
>>>> c
>5012083980 # 5GB uncompressed
>>>> 1.0 * c / (a + b)
>325.58735832769912 # 325 to 1
>
>And finally, min/max/avg chain length would also be interesting.
>
>>>> 10194 / 16.0
>637.125 # average chain length
It's a pseudo-average with generaldelta, but still useful I guess. Not sure
I can't efficiently compute min and max chain lengths, and not sure they make
complete sense in the context of gd.
>If we see wild differences between min/max/avg, that could point to
>interesting issues.
Yeah
>
>> deltas against prev : 3204
>> ..where prev = p1 : 3073
>> ..where prev = p2 : 33
>> ..other : 98
>> deltas against p1 : 6950
>> deltas against p2 : 24
>> deltas against other : 0
>>
>> The command can also be started with a filename directly, similar to
>> debugindex and debugdata:
>>
>> cryo at serene:~/crew$ hg debugrevlog mercurial/lock.py
>> format : 1
>> flags : inline
>> revisions : 32
>> merges : 2
>> chains : 1
>>
>> data size (min/max/avg) : 1052 / 4373 / 2830
>> compressed snapshot size (min/max/avg) : 454 / 454 / 454
>> compressed delta size (min/max/avg) : 0 / 767 / 130
>>
>> Notice how this, non-generaldelta revlog displays less information.
>
>We could probably still use stats like these here:
>
>> deltas against prev : 3204
>> ..where prev = p1 : 3073
>> ..where prev = p2 : 33
>> ..other : 98
>
>A high "other" count would be a good diagnostic.
Yes; I'll add those back in for non-gd logs.
/Sune
More information about the Mercurial-devel
mailing list