New command: hg debugrevlog

Sune Foldager cryo at cyanite.org
Thu May 12 13:02:46 CDT 2011


On 2011-05-12 12:13, Matt Mackall wrote:
>On Thu, 2011-05-12 at 18:44 +0200, Sune Foldager wrote:
>> I added a debugrevlog command which shows some data about a revlog (or filelog),
>> and some statistics. My original use case was to debug generaldelta performance.
>>
>> The command shows extensive statistics about the deltas if it's a generaldelta
>> revlog. Example output:
>>
>> cryo at serene:~/test$ hg debugrevlog .hg/store/00manifest.i
>> format    : 1
>> flags     : generaldelta
>> revisions : 10194
>> merges    : 3374
>> chains    : 16
>
>Does chains implicitly tell us the number of full revisions?
>Generaldelta allows 'chains' that are tree-shaped, so this is perhaps
>not the ideal name.

Yeah, no this is what it is; the name is misleading, it should be chain
bases instead (or snapshots). Counting chains when they are branchy isn't
really well-defined :-p. I'll change it.

>I'd like to see how much of a revlog is taken up by full revisions vs
>deltas.

Yes. So in this example, 16 out of 10194 revisions are full revisions.
I'll perhaps throw in some more percentages.

>
>> data size (min/max/avg)                : 392850 / 611167 / 491670
>> compressed snapshot size (min/max/avg) : 113171 / 151388 / 135161
>> compressed delta size (min/max/avg)    : 0 / 112440 / 1300
>
>>>> a = 16 * 135161
>>>> b = (10194 - 16) * 1300
>>>> 1.0 * a / (a + b)
>0.14048196515312222
>
>So approximately 14% of your revlog is full revisions, modulo rounding
>error.

Yeah, good idea. I'll throw those stats in as well. This is gonna be the
mother of all stat dumps.

>I'd also like to see total size and effective compression ratio.

Me too; how do we do that efficiently? I can't access the uncompressed size
of deltas, since that field holds the uncompressed size of the revision
instead.

>
>>>> a + b
>15393976  # 15MB compressed
>
>>>> c = 491670 * 10194
>>>> c
>5012083980  # 5GB uncompressed
>>>> 1.0 * c / (a + b)
>325.58735832769912  # 325 to 1
>
>And finally, min/max/avg chain length would also be interesting.
>
>>>> 10194 / 16.0
>637.125  # average chain length

It's a pseudo-average with generaldelta, but still useful I guess. Not sure
I can't efficiently compute min and max chain lengths, and not sure they make
complete sense in the context of gd.

>If we see wild differences between min/max/avg, that could point to
>interesting issues.

Yeah

>
>> deltas against prev  : 3204
>>    ..where prev = p1  : 3073
>>    ..where prev = p2  : 33
>>    ..other            : 98
>> deltas against p1    : 6950
>> deltas against p2    : 24
>> deltas against other : 0
>>
>> The command can also be started with a filename directly, similar to
>> debugindex and debugdata:
>>
>> cryo at serene:~/crew$ hg debugrevlog mercurial/lock.py
>> format    : 1
>> flags     : inline
>> revisions : 32
>> merges    : 2
>> chains    : 1
>>
>> data size (min/max/avg)                : 1052 / 4373 / 2830
>> compressed snapshot size (min/max/avg) : 454 / 454 / 454
>> compressed delta size (min/max/avg)    : 0 / 767 / 130
>>
>> Notice how this, non-generaldelta revlog displays less information.
>
>We could probably still use stats like these here:
>
>>  deltas against prev  : 3204
>>    ..where prev = p1  : 3073
>>    ..where prev = p2  : 33
>>    ..other            : 98
>
>A high "other" count would be a good diagnostic.

Yes; I'll add those back in for non-gd logs.

/Sune


More information about the Mercurial-devel mailing list