[PATCH 1 of 6 V2] obsstore: add a 'cachekey' method
Jun Wu
quark at fb.com
Mon May 22 22:22:58 EDT 2017
I have got a chance to test this in production with one of our biggest repo
and a 18MB obsstore. I tried running "hg id" with a a minimal hgrc (with
just lz4revlog and remotefilelog enabled) and without an existing obscache.
dualsourcecache.update takes 9 seconds. 7 seconds are used to scan
changelog, 0.36 seconds used to scan obsstore.
The related profiling data is as below:
9735 | _computeobsoleteset obsolete.py:1554
9421 \ update obsolete.py:1227
1872 \ _upgradeneeded obsolete.py:1311
434 \ revs (many times) changelog.py:319
1179 \ __iter__ obsolete.py:564
1179 | __get__ util.py:798
1177 | _all obsolete.py:707
1138 | _readmarkers obsolete.py:444
1138 | _fm1readmarkers obsolete.py:432
7549 \ _updatefrom obsolete.py:1433
* 7185 \ _updaterevs obsolete.py:1439
766 \ __get__ util.py:798
766 | successors obsolete.py:717
766 | _addsuccessors obsolete.py:506
* 3964 \ node (many times) changelog.py:359
364 \ _updatemarkers obsolete.py:1471
So it is expensive for us to rebuild the cache. It seems the cache may be
dropped in some cases (ex. histedit --abort). I wonder if we can make the
cache more persistent so it'll be impossible to be nuked entirely, like when
doing a strip, just truncate the cache file to the strip point instead of
nuking it.
Excerpts from Pierre-Yves David's message of 2017-05-20 17:30:15 +0200:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at octobus.net>
> # Date 1495191830 -7200
> # Fri May 19 13:03:50 2017 +0200
> # Node ID 221be1ef98902fa695a709371f75e63f9b3e950a
> # Parent 566cfe9cbbb9b163bb58c8666759a634badacdd7
> # EXP-Topic obscache
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> # hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r 221be1ef9890
> obsstore: add a 'cachekey' method
>
> Parsing the full obsstore is slow, so cache that depends on obsstore content
> wants a way to know if the obsstore changed, and it this change was append only.
>
> For this purpose we introduce an official cachekey for the obsstore. This cache
> key work in a way similar to the '(tiprev, tipnode)' pair used for the
> changelog. We use the size of the obsstore file and the hash of its tail. That
> way, we can check if the obsstore grew and if the content we knew is still
> present in the obsstore.
>
> This will be used in later changeset to cache related to the obsolete property.
>
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -70,6 +70,7 @@ comment associated with each format for
> from __future__ import absolute_import
>
> import errno
> +import hashlib
> import struct
>
> from .i18n import _
> @@ -547,6 +548,8 @@ class obsstore(object):
> # parents: (tuple of nodeid) or None, parents of precursors
> # None is used when no data has been recorded
>
> + _obskeysize = 200
> +
> def __init__(self, svfs, defaultformat=_fm1version, readonly=False):
> # caches for various obsolescence related cache
> self.caches = {}
> @@ -574,6 +577,46 @@ class obsstore(object):
>
> __bool__ = __nonzero__
>
> + def cachekey(self, index=None):
> + """return (current-length, cachekey)
> +
> + 'current-length': is the current length of the obsstore storage file,
> + 'cachekey' is the hash of the last 200 bytes ending at 'index'.
> +
> + If 'index' is unspecified, current obsstore length is used.
> + Cacheckey will be set to nullid if the obsstore is empty.
> + 'current-lenght' is -always- the current obsstore length, regardless of
> + the 'index' value.
> +
> + If the index specified is higher than the current obsstore file
> + length, cachekey will be set to None."""
> + # default value
> + obsstoresize = 0
> + keydata = ''
> + # try to get actual data from the obsstore
> + try:
> + with self.svfs('obsstore') as obsfile:
> + obsfile.seek(0, 2)
> + obsstoresize = obsfile.tell()
> + if index is None:
> + index = obsstoresize
> + elif obsstoresize < index:
> + return obsstoresize, None
> + actualsize = min(index, self._obskeysize)
> + if actualsize:
> + obsfile.seek(index - actualsize, 0)
> + keydata = obsfile.read(actualsize)
> + except (OSError, IOError) as e:
> + if e.errno != errno.ENOENT:
> + raise
> + if keydata:
> + key = hashlib.sha1(keydata).digest()
> + else:
> + # reusing an existing "empty" value make it easier to define a
> + # default cachekey for 'no data'.
> + key = node.nullid
> + return obsstoresize, key
> +
> @property
> def readonly(self):
> """True if marker creation is disabled
More information about the Mercurial-devel
mailing list