Tag caching, at last

Greg Ward greg at gerg.ca
Sun Jul 12 21:18:16 CDT 2009

On Sun, Jul 12, 2009 at 8:34 PM, Matt Mackall<mpm at selenic.com> wrote:
> On Fri, 2009-07-10 at 11:34 -0400, Greg Ward wrote:
>> I finally have a working tag cache implementation.  The "ah-ha" moment
>> came when I realized we don't *have* to cache .hgtags contents, since
>> that's not the most expensive part of reading tags.
> I think you're still making that bit too hard. Simply write out the tags
> as calculated and use them if and only if the cache is determined to be
> fully up to date. No need to be clever. Remember: no change is the
> common case.

OK, sure, my *previous* implementations may have made life too hard by
trying to be clever with .hgtags content.  But giving up on caching
.hgtags content makes things as simple as I can make them.
Algorithmically, I can't think of how to make things simpler than in
the patch I posted tonight.  (Which is essentially the same as the
code you reviewed.)

Here's the rub: strip can modify tag info without changing tip.  Sure,
the rev num of tip changes, but then I have to detect that, which
starts getting complicated again.  So it's not enough simply to
compare tips; we'd have to compare the whole head list, and there goes
the "skip repo.heads()" optimization.  Argh: round and round in
circles we go.

> If we want to be slightly clever we can notice when tip moves without
> changing .hgtags (the second most common case) and make that fast too.
> But that can wait till later.

Yes.  And I argue that *any* caching of .hgtags content can wait until
later.  Caching the head->filenode mapping is the big win, and I
actually got that working (unlike all my other attempts, which
foundered on strip, rollback, and tag rank).  I would just like to get
the current cache design into the mainline, and then we can see what
other optimizations are worth the trouble.

>>     __slots__ = ['ui',
>>                  'repo',
>>                  'shouldwrite',
>>                 ]
> We don't generally use __slots__ around here unless there's significant
> performance or memory at stake.

OK, sure.  Reflex coding on my part.

>>     cacheversion = 0                    # format version
> I'm not much of a fan of version numbers in file formats. Here we don't
> need one, because it's just a cache. If we decide to change the format,
> it's actually simplest to just change the filename at the same time.

Hey, good idea.  Well, I was just toying with the version idea to see
what people would think.  Guess you answered that question.  I'll
remove it.

> (ignoring for the moment that I think a class is overkill here)

Not gonna fight that one either.  Like I said before, having it in a
class was a godsend when I was juggling 3 or 4 implementations.
Having settled on one that I like, though, readcache() and
writecache() could just become more methods of localrepository.
(Although having a tags module definitely appeals.  This is a big
chunk of complexity that I'm about to make more complex, and localrepo
is big and hairy enough already.)

>>         # Determine which heads have been destroyed by strip or
>>         # rollback.  That'll ensure that writecache() writes accurate
>>         # data, and it makes life easier for
>>         # localrepo._findglobaltags().
>>         goodheads = []
>>         for head in cacheheads:
>>             try:
>>                 repo.changelog.rev(head)
>>                 goodheads.append(head)
>>             except error.LookupError:
>>                 pass
> It'd be nice to find a better pattern for that. Perhaps:
> goodheads = [h for h in cacheheads if h in repo]
> which means adding a __contains__ method to repo that doesn't raise
> LookupError.

I don't know the code well enough to say if the above is a common
pattern.  I wouldn't add __contains__() just for this case, though.

Please ignore the patch series I sent tonight.  (Although any comments
you may have on the runup refactorings are welcome.)  I'll resend


More information about the Mercurial-devel mailing list