[PATCH 2 of 3] repoview: cache hidden changesets

Pierre-Yves David pierre-yves.david at ens-lyon.org
Thu Aug 7 22:45:05 CDT 2014



On 08/07/2014 11:08 AM, David Soria Parra wrote:
> # HG changeset patch
> # User David Soria Parra <davidsp at fb.com>
> # Date 1407356774 25200
> #      Wed Aug 06 13:26:14 2014 -0700
> # Node ID 3da6820d38112cddfc21e2b99ab02957f93d80cf
> # Parent  4ac33b2438c3d50d5f1f21423f0a51ed32d12994
> repoview: cache hidden changesets

We cannot turn that on by default yet (so patches 3 needs to be somehow 
before patches 2 or merged with it). See comment on patches 3.

> Add a cache for hidden changesets. We cache all hidden changesets computed by
> _gethiddenblockers and check if a bookmark, tag or wd parent moved to a hidden
> changesets in which case we recompute parts. This significantly speeds up
> status, log, etc:
>
> without caching:
> $ time hg status
> hg status  0.72s user 0.20s system 100% cpu 0.917 total
>
> with caching
> $ time hg status
> hg status  0.49s user 0.15s system 100% cpu 0.645 total
> We invalidate the cache if the append only obsstore size changed from what we
> have observed before. In that case we recompute everything again.
>
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -1013,6 +1013,7 @@
>       def invalidatevolatilesets(self):
>           self.filteredrevcache.clear()
>           obsolete.clearobscaches(self)
> +        repoview.invalidatecache(self)

You cannot rely on cache invalidation to produce reliable results. So 
this line is either:

1. an early optimization that should come in another patch

2. a sign you rely on active invalidation from the client to detect 
outdated cache. You cannot trust the client to always update the cache 
properly (version too new or too old). And you cannot trust the 
environment either (eg: permission issue preventing to touch cache 
should not be fatal)

>
>       def invalidatedirstate(self):
>           '''Invalidates the dirstate, causing the next call to dirstate
> diff --git a/mercurial/repoview.py b/mercurial/repoview.py
> --- a/mercurial/repoview.py
> +++ b/mercurial/repoview.py
> @@ -10,6 +10,7 @@
>   import phases
>   import util
>   import obsolete
> +import struct
>   import tags as tagsmod
>
>
> @@ -56,17 +57,61 @@
>           blockers.extend([cl.rev(t[0]) for t in tags.values()])
>       return frozenset(blockers)
>
> +cachefile = 'cache/hidden'
> +cachemeta = 'cache/hidden.len'
> +def invalidatecache(repo):
> +    for filename in [cachefile, cachemeta]:
> +        if repo.vfs.exists(filename):
> +            repo.vfs.unlink(filename)

cf above

> +
>   def computehidden(repo):
>       """compute the set of hidden revision to filter
>
>       During most operation hidden should be filtered."""
>       assert not repo.changelog.filteredrevs
> +
> +    def iscachevalid(repo, hideable):
> +        """The cache is invalid if the append-only obsstore has more markers
> +        then we observed before."""
> +        if repo.vfs.exists(cachefile):
> +            oldlen = repo.vfs.tryread(cachemeta)
> +            if oldlen and int(oldlen) == len(hideable):
> +                return True
> +        return False

1. I do not see good reasons for this to be a closure, extract it from 
the function.

2. You cannot have separated file for the meta and the file. Otherwise 
you may hit a race condition where the content of the cache file is 
changed between the meta validation and its reading. Your cache validity 
keys, needs to be at the head of you file. (You can look at the branch 
cache for an example)

2.5 (I recommend having a version number in the cache header to easily 
iterate format if needed)

3. You cannot just rely on the obsstore length. the movement of the 
public phases may inhibit some of the content of the obsstore. Also, as 
pointed in the previous changeset, child changeset may reveal some 
hideable changesets so changeset movement should be accounted too. 
(Those two things may have been caught by the cache invalidation call, 
but as explained above we cannot rely on it.

> +
> +    hidden = frozenset()
>       hideable = hideablerevs(repo)
>       if hideable:
>           cl = repo.changelog
> -        blocked = cl.ancestors(_gethiddenblockers(repo), inclusive=True)
> -        return frozenset(r for r in hideable if r not in blocked)
> -    return frozenset()
> +        if iscachevalid(repo, hideable):
> +            data = repo.vfs.read(cachefile)
> +            hidden = frozenset(struct.unpack('%sI' % (len(data) / 4), data))
> +        else:
> +            # recompute cache
> +            blocked = cl.ancestors(_gethiddenblockers(repo), inclusive=True)
> +            hidden = frozenset(r for r in hideable if r not in blocked)
> +
> +            # write cache to file
> +            try:
> +                data = struct.pack('%sI' % len(hidden), *hidden)
> +                fh = repo.vfs(cachefile, 'w+b', atomictemp=True)
> +                fh.write(data)
> +                fh.close()
> +
> +                # our observed size of the obsstore. used for cache invalidation.
> +                fh = repo.vfs(cachemeta, 'w+b', atomictemp=True)
> +                fh.write('%d' % len(hideable))
> +                fh.close()
> +            except IOError:
> +                ui.debug(_('error writing hidden changesets cache'))

1. You should grab the wlock before writing anything.

2. the `close()` should be in a finally: clause to make sure it is closed

3. cf above for the two file approach that as vulnerable to race condition

4. I would personally extract the cache writing code in a function to 
not pollute this main function so much.

> +
> +        # check if we have wd parents, bookmarks or tags pointing to hidden
> +        # changesets and remove those.
> +        dynamic = hidden & _getdynamicblockers(repo)
> +        if dynamic:
> +            blocked = cl.ancestors(dynamic, inclusive=True)
> +            hidden &= frozenset(r for r in hideable if r not in blocked)
> +    return hidden
>
>   def computeunserved(repo):
>       """compute the set of revision that should be filtered when used a server
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list