[PATCH 2 of 4] revlog: calculate base revisions iteratively

Sune Foldager cryo at cyanite.org
Fri May 6 03:15:11 CDT 2011


On Fri, May 06, 2011 at 12:58:40 -0500, Matt Mackall wrote:
>On Thu, 2011-05-05 at 18:27 +0200, Sune Foldager wrote:
>> # HG changeset patch
>> # User Sune Foldager <cryo at cyanite.org>
>> # Date 1304611530 -7200
>> # Node ID e94b057be38fc9362d4ea3f50d22d0470502b617
>> # Parent  1b9d53b3eb06728afc9aae4198a53fd0e05e04e9
>> revlog: calculate base revisions iteratively
>>
>> This is in preparation for generaldelta, where the revlog entry base field
>> is reinterpreted as the deltaparent. For that reason we also rename the base
>> function to chainbase.
>
>Hmm, I don't think this is ideal.
>
>a) in linear delta, base is just e[3], let's just return that

But since we're presumably going to switch the default format to generaldelta,
"all" new revlogs will be generaldelta. Why optimize specifically for older
revlogs, especially since base/chainbase isn't really used much? We would
have to check a flag (which isn't introduced in this changeset). If speed
is that important, people shouldn't even call chainbase (and they don't).

>b) in generaldelta, we need to walk the chain to find base.. but it's
>conceptually still base

Of course, but we only need that in one case, see below.

>c) looks like we're walking the chain twice in revision/addrevision

No, revision doesn't use chainbase, and is probably as fast as before since
it doesn't call the base function anymore. It walks the chain once, and detects
its base at the same time.

_addrevision does call chainbase, but it does a lot of other somewhat
expensive things as well. It's not highly optimized in the way revision is.

>And it turns out that this chain-walking is actually fairly visible for
>performance of things like verify (I spent a bit tuning the inner loop
>in the parentdelta version). We don't want to do it more than once, and
>we don't want to do it at all for old revlogs.

Remember the walking for old revlogs is now:

a = index[rev][3]
# new steps:
x = index[a][3]
<while a != x which is always false for old logs>
return a

It's not a big difference, but we can check performance, I guess. As for other
users of chainbase, they can be optimized, but there aren't many:

- two uses in debugindex; not important.
- one use in revlog in _addrevision; can maybe be optimized slightly, maybe not.

That's it. revision is as optimized as before :).

Sune


More information about the Mercurial-devel mailing list