[PATCH 2 of 3] hgtagsfnodescache: inherit fnode from parent when possible

Pierre-Yves David pierre-yves.david at ens-lyon.org
Thu May 2 18:05:43 EDT 2019



On 5/2/19 11:55 PM, Martin von Zweigbergk wrote:
> 
> 
> On Thu, May 2, 2019 at 2:48 PM Pierre-Yves David 
> <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>> 
> wrote:
> 
> 
> 
>     On 5/2/19 8:24 PM, Martin von Zweigbergk wrote:
>      >
>      >
>      > On Thu, May 2, 2019 at 9:37 AM Pierre-Yves David
>      > <pierre-yves.david at ens-lyon.org
>     <mailto:pierre-yves.david at ens-lyon.org>
>     <mailto:pierre-yves.david at ens-lyon.org
>     <mailto:pierre-yves.david at ens-lyon.org>>>
>      > wrote:
>      >
>      >     # HG changeset patch
>      >     # User Pierre-Yves David <pierre-yves.david at octobus.net
>     <mailto:pierre-yves.david at octobus.net>
>      >     <mailto:pierre-yves.david at octobus.net
>     <mailto:pierre-yves.david at octobus.net>>>
>      >     # Date 1552263020 -3600
>      >     #      Mon Mar 11 01:10:20 2019 +0100
>      >     # Node ID eac353183daaef0a503da8cd72b8df43f54d7fb8
>      >     # Parent  a753bc019c1ad7c5661a050adce49e4c3cd5a786
>      >     # EXP-Topic fnodecache
>      >     # Available At https://bitbucket.org/octobus/mercurial-devel/
>      >     #              hg pull
>      > https://bitbucket.org/octobus/mercurial-devel/ -r eac353183daa
>      >     hgtagsfnodescache: inherit fnode from parent when possible
>      >
>      >     If a changeset does not update the content of `.hgtags`, it
>     means it
>      >     will use
>      >     the same file-node (for `.hgtags`) than its parents. In this such
>      >     case we can
>      >     directly reuse the parent's file-node.
>      >
>      >     We use this property when updating the `hgtagsfnodescache`
>     taking a
>      >     faster path
>      >     if we already have a cached value for the parent's of the node we
>      >     are looking
>      >     at.
>      >
>      >     Doing so provide a large performance boost when looking at a
>     lot of
>      >     fnode,
>      >     especially on repository with very large manifest:
>      >
>      >     timing for `tagsmod.fnoderevs(ui, repo, repo.changelog.revs())`
>      >
>      >
>      > What end-user command does this correspond to? `hg tags` with no
>      > .hg/cache/tags?
> 
>     hg debugupdatecache
> 
>      >
>      >
>      >     mercurial: (41907 revisions, 1923 files)
>      >
>      >          before: 6.9 seconds
>      >          after:  2.7 seconds (-54%)
>      >
>      >     pypy: (96266 revisions, 5198 files)
>      >
>      >          before: 80 seconds
>      >          after:  20 seconds (-75%)
>      >
>      >     mozilla-central: (463411 revisions, 272080 files)
>      >
>      >          before: 7166.4 seconds
>      >          after:    47.8 seconds (-99%, x150 speedup)
>      >
>      >
>      > Nice improvements :) How did people work with these repos before?
> 
>     This is the timing for compute the information for all nodes. To
>     retrieve current tags name we only need this data for all heads.
> 
>     Getting it for all heads is still very slow to compute initially. (that
>     is why we exchange them during clone now).
> 
>     To illustrate the slowness, I started a tags computation from cold
>     cache… This was 3 hours ago…
> 
> 
>     So currently we only use (and exchange) entry for the repository heads.
>     However, the speedup rely on reusing data from the parent. So warming
>     all entries during a `hg debugupdatecache` turns out to be more
>     efficient (with the new code).
> 
>     I guess the next step from here is to warm all entry in all cases (not
>     just `hg debugupdatecache`) and efficiently exchange them over the wire.
> 
> 
> 
>      >
>      >
>      >     On a copy of mozilla-try with about 35K heads ans 1.7M
>     changesets,
>      >     this move
>      >     the computation from many hours to a couple of minutes.
>     Making it more
>      >     interresting to do a full warm up of this cache before computing
>      >     tags (from a
>      >     cold cache).
>      >
>      >     There seems to be other performance low hanging fruits, like
>     avoid
>      >     the used of
>      >     changectx or a more revision centric logic. However, the new
>     code is
>      >     fast enough
>      >     for my needs right now.
>      >
>      >     diff --git a/mercurial/tags.py b/mercurial/tags.py
>      >     --- a/mercurial/tags.py
>      >     +++ b/mercurial/tags.py
>      >     @@ -18,6 +18,7 @@ from .node import (
>      >           bin,
>      >           hex,
>      >           nullid,
>      >     +    nullrev,
>      >           short,
>      >       )
>      >       from .i18n import _
>      >     @@ -718,12 +719,33 @@ class hgtagsfnodescache(object):
>      >               if not computemissing:
>      >                   return None
>      >
>      >     -        # Populate missing entry.
>      >     -        try:
>      >     -            fnode = ctx.filenode('.hgtags')
>      >     -        except error.LookupError:
>      >     -            # No .hgtags file on this revision.
>      >     -            fnode = nullid
>      >     +        fnode = None
>      >     +        cl = self._repo.changelog
>      >     +        p1rev, p2rev = cl._uncheckedparentrevs(rev)
>      >     +        p1node = cl.node(p1rev)
>      >     +        p1fnode = self.getfnode(p1node, computemissing=False)
>      >     +        if p2rev != nullrev:
>      >     +            # There is some no-merge changeset where p1 is
>     null and
>      >     p2 is set
>      >     +            # Processing them are merge is just slower, but
>     still
>      >     give a good
>      >     +            # result.
>      >
>      >
>      > I think you're thinking of file copies, see
>      >
>     https://www.mercurial-scm.org/repo/hg/file/fdbeacb9d456/mercurial/localrepo.py#l2348
> 
>     I am lost here. were are iterating over the changelog and the manifest
>     here. This code deal with "malformed" changelog entry. Why are file
>     copies relevant here?
> 
> 
> I don't think you're lost. I think I just misunderstood what this was 
> about. I was not aware that some repos have commits broken in that way. 
> Any idea how that happened?

People using debugsetparent in most case (could be faulty extension code 
too). The data structure allows to represent it, so it happens.

You can also have changeset with p1 == p2 ≠ nullrev.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list