[PATCH 5 of 5 RFC] obsolete: store obsolete revisions to cache file
Yuya Nishihara
yuya at tcha.org
Thu Sep 24 10:10:22 CDT 2015
On Wed, 23 Sep 2015 01:40:12 -0700, Pierre-Yves David wrote:
> On 09/22/2015 10:48 PM, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya at tcha.org>
> > # Date 1441508842 -32400
> > # Sun Sep 06 12:07:22 2015 +0900
> > # Node ID 1e78698ebd6c791fbca9da6bf56b65ef75f21d43
> > # Parent 35de00be99c2536643df4e100c7784d39829614b
> > obsolete: store obsolete revisions to cache file
> >
> > This saves ~300msec for the repository that has 83731 markers (8.4MB) in total,
> > but has 5404 revisions marked as obsolete (22kB).
> >
> > $ time hg log -G -l10 > /dev/null
> > before: 0.474sec
> > after: 0.121sec
> >
> > We could cache (rev, marker-offset) pair so that uninterested markers can
> > be skipped at the parsing state, but it would require further design change.
>
> There is some point I would like to see addressed in the descrition:
>
> - What's the filename (or naming scheme)
"cache/obsrevs-{'b'ig or 'l'ittle}{version}"
It has endian flag in case a repository is shared across Power and x86
machines. version isn't important.
> - What are actually storing rev? or hash?
It stores revision numbers that are marked as obsolete.
It seems getting revisions from thousands of hashes is slow.
> - What is the validation strategy and why is it correct.
SHA1 hash of {all repository heads + the last obsolete marker}, and offset
to the last obsolete marker.
Because revision numbers can change on strip, hashes of all heads are
included. This is the same strategy as cache/hidden, I think.
The difference is: in order to detect change of obsstore, it uses the
hash of the last marker and the offset to that marker. cache/hidden uses
the hash of obsolete revisions, which is what this patch tries to avoid
loading from obsstore.
It can detect added or truncated markers as follows:
a) if marker added, hash(obsstore[lastmarkerofs:]) != cachedhash
b) if marker truncated, obsstore[lastmarkerofs:] == '',
so hash(obsstore[lastmarkerofs:]) != cachedhash
c) edge case, if marker added/truncated by another process in the middle
of reading/writing cache, lastmarkerofs won't be calculated correctly,
so hash(obsstore[lastmarkerofs:]) != cachedhash
But it can't detect changes if different marker is inserted after truncating,
and the identical marker is appended at the exactly same offset.
lastmarkerofs
v
original: |......|last| (cached)
truncated: |..| (by old hg, cache not updated)
insert new: |..|new| (by old hg, cache not updated)
append: |..|new|last|
^^^
can't detect this change because it isn't included in the hash
We could calculate the hash of the whole obsstore, but it would be slow.
> On this "cache" topic. I was hoping to get a caching structure that
> allow a key to have a growing number of value. It would have multiple
> applications including here. Caching the precursors-rev -> markers
> mapping. Since the keys are dynamic, we can probably use a fixed size
> index + extra data somewhere. With such setup, reading the index would
> give the same information you are trying to cache here. If we cache on
> 'rev' it would even give it in O(1) fashion (similar to the current
> branch cache)
Good point. I was thinking about that sort of caching mechanism. But before
starting, I wanted to see how fast it would be, and the result is this patch.
I'll try something like your suggestion.
Regards,
More information about the Mercurial-devel
mailing list