Design of persistent tag cache

Matt Mackall mpm at selenic.com
Mon Jul 6 14:32:49 CDT 2009


On Sat, 2009-07-04 at 19:26 -0400, Greg Ward wrote:
> Having implemented persistent tag caching according to the design that
> Matt suggested back in May, I think I've hit a problem: stripping.
> Here's Matt's original idea for the tag cache: a text file like
> 
>   <headnode> <headrev> <tagnode>
>   [...repeat: 1 line per head...]
> 
>   <node> <tag>
>   [...repeat: 1 line per tag...]
> 
> Good things about this design:
>   * it's easy to detect if the entire cache is still valid: if the set
> of heads in the cache == the current repo heads, the cache is all good
> and we don't have to read any .hgtags files

We can do better. If the tip is stored correctly in the cache, we can
assume all the other heads are correct. This ignores mq/strip, which
will have to invalidate/update the cache for all the rules it breaks.

>   * when heads have been added, it minimizes the amount of work to
> refresh the cache: IOW, you only need to visit the heads with a new
> version of .hgtags

No, when the cache is invalid, you must visit all relevant versions
of .hgtags. This is *not expensive*. Again, what makes looking up tags
slow is looking up .hgtags entries in the associated manifests. Let me
make this concrete:

Assume a repo with 50k files, 100k revisions and 500 heads:

1) finding heads in the changelog (~1s)
2) finding corresponding .hgtags revisions from manifest (~1s x 500)
3) parsing those .hgtags revisions (~.01s x (unique from 500))

This makes the fairly realistic assumption that you have 100x fewer
entries in your average .hgtags file than in your manifest. Fully
cached, we can skip 1-3. But if we can only skip 2, we're still doing
much better. So you end up taking 506 seconds completely uncached, 6
seconds with cached .hgtags pointers, and 0 seconds fully cached. For
more reasonable numbers of heads, the number will be more like 10, 1,
and 0.

-- 
http://selenic.com : development and support for Mercurial and Linux




More information about the Mercurial-devel mailing list