[PATCH 1 of 6 V2] obsstore: add a 'cachekey' method

Jun Wu quark at fb.com
Mon May 22 22:22:58 EDT 2017


I have got a chance to test this in production with one of our biggest repo
and a 18MB obsstore. I tried running "hg id" with a a minimal hgrc (with
just lz4revlog and remotefilelog enabled) and without an existing obscache.

dualsourcecache.update takes 9 seconds. 7 seconds are used to scan
changelog, 0.36 seconds used to scan obsstore.

The related profiling data is as below:

  9735       | _computeobsoleteset              obsolete.py:1554
  9421        \ update                          obsolete.py:1227
  1872          \ _upgradeneeded                obsolete.py:1311
   434            \ revs (many times)           changelog.py:319
  1179            \ __iter__                    obsolete.py:564
  1179             | __get__                    util.py:798
  1177             | _all                       obsolete.py:707
  1138             | _readmarkers               obsolete.py:444
  1138             | _fm1readmarkers            obsolete.py:432
  7549          \ _updatefrom                   obsolete.py:1433
* 7185            \ _updaterevs                 obsolete.py:1439
   766              \ __get__                   util.py:798
   766               | successors               obsolete.py:717
   766               | _addsuccessors           obsolete.py:506
* 3964              \ node (many times)         changelog.py:359
   364            \ _updatemarkers              obsolete.py:1471

So it is expensive for us to rebuild the cache. It seems the cache may be
dropped in some cases (ex. histedit --abort). I wonder if we can make the
cache more persistent so it'll be impossible to be nuked entirely, like when
doing a strip, just truncate the cache file to the strip point instead of
nuking it.

Excerpts from Pierre-Yves David's message of 2017-05-20 17:30:15 +0200:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david at octobus.net>
> # Date 1495191830 -7200
> #      Fri May 19 13:03:50 2017 +0200
> # Node ID 221be1ef98902fa695a709371f75e63f9b3e950a
> # Parent  566cfe9cbbb9b163bb58c8666759a634badacdd7
> # EXP-Topic obscache
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ 
> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/  -r 221be1ef9890
> obsstore: add a 'cachekey' method
> 
> Parsing the full obsstore is slow, so cache that depends on obsstore content
> wants a way to know if the obsstore changed, and it this change was append only.
> 
> For this purpose we introduce an official cachekey for the obsstore. This cache
> key work in a way similar to the '(tiprev, tipnode)' pair used for the
> changelog. We use the size of the obsstore file and the hash of its tail. That
> way, we can check if the obsstore grew and if the content we knew is still
> present in the obsstore.
> 
> This will be used in later changeset to cache related to the obsolete property.
> 
> diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
> --- a/mercurial/obsolete.py
> +++ b/mercurial/obsolete.py
> @@ -70,6 +70,7 @@ comment associated with each format for 
>  from __future__ import absolute_import
>  
>  import errno
> +import hashlib
>  import struct
>  
>  from .i18n import _
> @@ -547,6 +548,8 @@ class obsstore(object):
>      # parents: (tuple of nodeid) or None, parents of precursors
>      #          None is used when no data has been recorded
>  
> +    _obskeysize = 200
> +
>      def __init__(self, svfs, defaultformat=_fm1version, readonly=False):
>          # caches for various obsolescence related cache
>          self.caches = {}
> @@ -574,6 +577,46 @@ class obsstore(object):
>  
>      __bool__ = __nonzero__
>  
> +    def cachekey(self, index=None):
> +        """return (current-length, cachekey)
> +
> +        'current-length': is the current length of the obsstore storage file,
> +        'cachekey' is the hash of the last 200 bytes ending at 'index'.
> +
> +        If 'index' is unspecified, current obsstore length is used.
> +        Cacheckey will be set to nullid if the obsstore is empty.
> +        'current-lenght' is -always- the current obsstore length, regardless of
> +        the 'index' value.
> +
> +        If the index specified is higher than the current obsstore file
> +        length, cachekey will be set to None."""
> +        # default value
> +        obsstoresize = 0
> +        keydata = ''
> +        # try to get actual data from the obsstore
> +        try:
> +            with self.svfs('obsstore') as obsfile:
> +                obsfile.seek(0, 2)
> +                obsstoresize = obsfile.tell()
> +                if index is None:
> +                    index = obsstoresize
> +                elif obsstoresize < index:
> +                    return obsstoresize, None
> +                actualsize = min(index, self._obskeysize)
> +                if actualsize:
> +                    obsfile.seek(index - actualsize, 0)
> +                    keydata = obsfile.read(actualsize)
> +        except (OSError, IOError) as e:
> +            if e.errno != errno.ENOENT:
> +                raise
> +        if keydata:
> +            key = hashlib.sha1(keydata).digest()
> +        else:
> +            # reusing an existing "empty" value make it easier to define a
> +            # default cachekey for 'no data'.
> +            key = node.nullid
> +        return obsstoresize, key
> +
>      @property
>      def readonly(self):
>          """True if marker creation is disabled


More information about the Mercurial-devel mailing list