[PATCH 2 of 3] hgtagsfnodescache: inherit fnode from parent when possible

Martin von Zweigbergk martinvonz at google.com
Thu May 2 18:14:36 EDT 2019


On Thu, May 2, 2019 at 3:05 PM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 5/2/19 11:55 PM, Martin von Zweigbergk wrote:
> >
> >
> > On Thu, May 2, 2019 at 2:48 PM Pierre-Yves David
> > <pierre-yves.david at ens-lyon.org <mailto:pierre-yves.david at ens-lyon.org>>
>
> > wrote:
> >
> >
> >
> >     On 5/2/19 8:24 PM, Martin von Zweigbergk wrote:
> >      >
> >      >
> >      > On Thu, May 2, 2019 at 9:37 AM Pierre-Yves David
> >      > <pierre-yves.david at ens-lyon.org
> >     <mailto:pierre-yves.david at ens-lyon.org>
> >     <mailto:pierre-yves.david at ens-lyon.org
> >     <mailto:pierre-yves.david at ens-lyon.org>>>
> >      > wrote:
> >      >
> >      >     # HG changeset patch
> >      >     # User Pierre-Yves David <pierre-yves.david at octobus.net
> >     <mailto:pierre-yves.david at octobus.net>
> >      >     <mailto:pierre-yves.david at octobus.net
> >     <mailto:pierre-yves.david at octobus.net>>>
> >      >     # Date 1552263020 -3600
> >      >     #      Mon Mar 11 01:10:20 2019 +0100
> >      >     # Node ID eac353183daaef0a503da8cd72b8df43f54d7fb8
> >      >     # Parent  a753bc019c1ad7c5661a050adce49e4c3cd5a786
> >      >     # EXP-Topic fnodecache
> >      >     # Available At https://bitbucket.org/octobus/mercurial-devel/
> >      >     #              hg pull
> >      > https://bitbucket.org/octobus/mercurial-devel/ -r eac353183daa
> >      >     hgtagsfnodescache: inherit fnode from parent when possible
> >      >
> >      >     If a changeset does not update the content of `.hgtags`, it
> >     means it
> >      >     will use
> >      >     the same file-node (for `.hgtags`) than its parents. In this
> such
> >      >     case we can
> >      >     directly reuse the parent's file-node.
> >      >
> >      >     We use this property when updating the `hgtagsfnodescache`
> >     taking a
> >      >     faster path
> >      >     if we already have a cached value for the parent's of the
> node we
> >      >     are looking
> >      >     at.
> >      >
> >      >     Doing so provide a large performance boost when looking at a
> >     lot of
> >      >     fnode,
> >      >     especially on repository with very large manifest:
> >      >
> >      >     timing for `tagsmod.fnoderevs(ui, repo,
> repo.changelog.revs())`
> >      >
> >      >
> >      > What end-user command does this correspond to? `hg tags` with no
> >      > .hg/cache/tags?
> >
> >     hg debugupdatecache
> >
> >      >
> >      >
> >      >     mercurial: (41907 revisions, 1923 files)
> >      >
> >      >          before: 6.9 seconds
> >      >          after:  2.7 seconds (-54%)
> >      >
> >      >     pypy: (96266 revisions, 5198 files)
> >      >
> >      >          before: 80 seconds
> >      >          after:  20 seconds (-75%)
> >      >
> >      >     mozilla-central: (463411 revisions, 272080 files)
> >      >
> >      >          before: 7166.4 seconds
> >      >          after:    47.8 seconds (-99%, x150 speedup)
> >      >
> >      >
> >      > Nice improvements :) How did people work with these repos before?
> >
> >     This is the timing for compute the information for all nodes. To
> >     retrieve current tags name we only need this data for all heads.
> >
> >     Getting it for all heads is still very slow to compute initially.
> (that
> >     is why we exchange them during clone now).
> >
> >     To illustrate the slowness, I started a tags computation from cold
> >     cache… This was 3 hours ago…
> >
> >
> >     So currently we only use (and exchange) entry for the repository
> heads.
> >     However, the speedup rely on reusing data from the parent. So warming
> >     all entries during a `hg debugupdatecache` turns out to be more
> >     efficient (with the new code).
> >
> >     I guess the next step from here is to warm all entry in all cases
> (not
> >     just `hg debugupdatecache`) and efficiently exchange them over the
> wire.
> >
> >
> >
> >      >
> >      >
> >      >     On a copy of mozilla-try with about 35K heads ans 1.7M
> >     changesets,
> >      >     this move
> >      >     the computation from many hours to a couple of minutes.
> >     Making it more
> >      >     interresting to do a full warm up of this cache before
> computing
> >      >     tags (from a
> >      >     cold cache).
> >      >
> >      >     There seems to be other performance low hanging fruits, like
> >     avoid
> >      >     the used of
> >      >     changectx or a more revision centric logic. However, the new
> >     code is
> >      >     fast enough
> >      >     for my needs right now.
> >      >
> >      >     diff --git a/mercurial/tags.py b/mercurial/tags.py
> >      >     --- a/mercurial/tags.py
> >      >     +++ b/mercurial/tags.py
> >      >     @@ -18,6 +18,7 @@ from .node import (
> >      >           bin,
> >      >           hex,
> >      >           nullid,
> >      >     +    nullrev,
> >      >           short,
> >      >       )
> >      >       from .i18n import _
> >      >     @@ -718,12 +719,33 @@ class hgtagsfnodescache(object):
> >      >               if not computemissing:
> >      >                   return None
> >      >
> >      >     -        # Populate missing entry.
> >      >     -        try:
> >      >     -            fnode = ctx.filenode('.hgtags')
> >      >     -        except error.LookupError:
> >      >     -            # No .hgtags file on this revision.
> >      >     -            fnode = nullid
> >      >     +        fnode = None
> >      >     +        cl = self._repo.changelog
> >      >     +        p1rev, p2rev = cl._uncheckedparentrevs(rev)
> >      >     +        p1node = cl.node(p1rev)
> >      >     +        p1fnode = self.getfnode(p1node, computemissing=False)
> >      >     +        if p2rev != nullrev:
> >      >     +            # There is some no-merge changeset where p1 is
> >     null and
> >      >     p2 is set
> >      >     +            # Processing them are merge is just slower, but
> >     still
> >      >     give a good
> >      >     +            # result.
> >      >
> >      >
> >      > I think you're thinking of file copies, see
> >      >
> >
> https://www.mercurial-scm.org/repo/hg/file/fdbeacb9d456/mercurial/localrepo.py#l2348
> >
> >     I am lost here. were are iterating over the changelog and the
> manifest
> >     here. This code deal with "malformed" changelog entry. Why are file
> >     copies relevant here?
> >
> >
> > I don't think you're lost. I think I just misunderstood what this was
> > about. I was not aware that some repos have commits broken in that way.
> > Any idea how that happened?
>
> People using debugsetparent in most case (could be faulty extension code
> too). The data structure allows to represent it, so it happens.
>
> You can also have changeset with p1 == p2 ≠ nullrev.
>

Yes, I think I've seen evolve do that in some cases.


>
> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20190502/566fc1f6/attachment.html>


More information about the Mercurial-devel mailing list