[PATCH] Implement persistent tag caching

Tue May 12 12:09:15 CDT 2009

On Tue, May 12, 2009 at 11:05 AM, Matt Mackall <mpm at selenic.com> wrote:
> As much as we all love Python, you're not allowed to use pickle as a
> format.

Figured as much.  Hey, it's a proof-of-concept patch.

> To determine whether tags have appeared, we need to know if the
> revisions of the repository heads have changed and if their
> corresponding .hgtags have changed. Most of the time spent finding
> tags is spent opening the manifest to look this data up.
>
> This suggests the following layout for tags:
>
> <head 1 hash> <head 1 rev> <.hgtags hash>
> <head 2 hash> <head 2 rev> <.hgtags hash>
> ...
> [blank line]
> <tag 1 hash> <tag 1 name>
> <tag 2 hash> <tag 2 name>
> ...
>
> This lets us (a) quickly check whether the tags are valid and (b)
> quickly find all the relevant .hgtags revisions without visiting the
> manifest.

Ahh, I see: this lets us detect an invalid cache late, when we need
the tags.  So there's no need to hook into commit/push/pull/whatever.
That makes the tag cache more self-contained, so I'm all for it.  But
is there any performance benefit to your approach?  As I understand
it, it will invalidate the cache as frequently as my plan would, and
the cost of invalidation will be the same: spend 9 seconds looping
over 140 heads.  (Or whatever: that's the scenario I'm facing that
prompted this patch.)

One question: when you say <.hgtags hash>, does that mean "node ID of
the latest revision of the filelog .hg/store/data/.hgtags"?  (Gee, I
hope I've got my terminology right and did not just make myself look
like a fool...)

Thanks for the feedback!

Greg