[PATCH 5 of 5 RFC] obsolete: store obsolete revisions to cache file

Yuya Nishihara yuya at tcha.org
Thu Sep 24 10:10:22 CDT 2015


On Wed, 23 Sep 2015 01:40:12 -0700, Pierre-Yves David wrote:
> On 09/22/2015 10:48 PM, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya at tcha.org>
> > # Date 1441508842 -32400
> > #      Sun Sep 06 12:07:22 2015 +0900
> > # Node ID 1e78698ebd6c791fbca9da6bf56b65ef75f21d43
> > # Parent  35de00be99c2536643df4e100c7784d39829614b
> > obsolete: store obsolete revisions to cache file
> >
> > This saves ~300msec for the repository that has 83731 markers (8.4MB) in total,
> > but has 5404 revisions marked as obsolete (22kB).
> >
> >    $ time hg log -G -l10 > /dev/null
> >    before: 0.474sec
> >    after:  0.121sec
> >
> > We could cache (rev, marker-offset) pair so that uninterested markers can
> > be skipped at the parsing state, but it would require further design change.
> 
> There is some point I would like to see addressed in the descrition:
> 
> - What's the filename (or naming scheme)

"cache/obsrevs-{'b'ig or 'l'ittle}{version}"

It has endian flag in case a repository is shared across Power and x86
machines. version isn't important.

> - What are actually storing rev? or hash?

It stores revision numbers that are marked as obsolete.
It seems getting revisions from thousands of hashes is slow.

> - What is the validation strategy and why is it correct.

SHA1 hash of {all repository heads + the last obsolete marker}, and offset
to the last obsolete marker.

Because revision numbers can change on strip, hashes of all heads are
included. This is the same strategy as cache/hidden, I think.

The difference is: in order to detect change of obsstore, it uses the
hash of the last marker and the offset to that marker. cache/hidden uses
the hash of obsolete revisions, which is what this patch tries to avoid
loading from obsstore.

It can detect added or truncated markers as follows:

 a) if marker added, hash(obsstore[lastmarkerofs:]) != cachedhash
 b) if marker truncated, obsstore[lastmarkerofs:] == '',
    so hash(obsstore[lastmarkerofs:]) != cachedhash
 c) edge case, if marker added/truncated by another process in the middle
    of reading/writing cache, lastmarkerofs won't be calculated correctly,
    so hash(obsstore[lastmarkerofs:]) != cachedhash

But it can't detect changes if different marker is inserted after truncating,
and the identical marker is appended at the exactly same offset.

                  lastmarkerofs
                    v
 original:   |......|last|   (cached)
 truncated:  |..|            (by old hg, cache not updated)
 insert new: |..|new|        (by old hg, cache not updated)
 append:     |..|new|last|
                 ^^^
              can't detect this change because it isn't included in the hash

We could calculate the hash of the whole obsstore, but it would be slow.

> On this "cache" topic. I was hoping to get a caching structure that 
> allow a key to have a growing number of value. It would have multiple 
> applications including here. Caching the precursors-rev -> markers 
> mapping. Since the keys are dynamic, we can probably use a fixed size 
> index + extra data somewhere. With such setup, reading the index would 
> give the same information you are trying to cache here. If we cache on 
> 'rev' it would even give it in O(1) fashion (similar to the current 
> branch cache)

Good point. I was thinking about that sort of caching mechanism. But before
starting, I wanted to see how fast it would be, and the result is this patch.

I'll try something like your suggestion.

Regards,


More information about the Mercurial-devel mailing list