<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, May 2, 2019 at 3:05 PM Pierre-Yves David <<a href="mailto:pierre-yves.david@ens-lyon.org">pierre-yves.david@ens-lyon.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
On 5/2/19 11:55 PM, Martin von Zweigbergk wrote:<br>
> <br>
> <br>
> On Thu, May 2, 2019 at 2:48 PM Pierre-Yves David <br>
> <<a href="mailto:pierre-yves.david@ens-lyon.org" target="_blank">pierre-yves.david@ens-lyon.org</a> <mailto:<a href="mailto:pierre-yves.david@ens-lyon.org" target="_blank">pierre-yves.david@ens-lyon.org</a>>> <br>
> wrote:<br>
> <br>
> <br>
> <br>
> On 5/2/19 8:24 PM, Martin von Zweigbergk wrote:<br>
> ><br>
> ><br>
> > On Thu, May 2, 2019 at 9:37 AM Pierre-Yves David<br>
> > <<a href="mailto:pierre-yves.david@ens-lyon.org" target="_blank">pierre-yves.david@ens-lyon.org</a><br>
> <mailto:<a href="mailto:pierre-yves.david@ens-lyon.org" target="_blank">pierre-yves.david@ens-lyon.org</a>><br>
> <mailto:<a href="mailto:pierre-yves.david@ens-lyon.org" target="_blank">pierre-yves.david@ens-lyon.org</a><br>
> <mailto:<a href="mailto:pierre-yves.david@ens-lyon.org" target="_blank">pierre-yves.david@ens-lyon.org</a>>>><br>
> > wrote:<br>
> ><br>
> > # HG changeset patch<br>
> > # User Pierre-Yves David <<a href="mailto:pierre-yves.david@octobus.net" target="_blank">pierre-yves.david@octobus.net</a><br>
> <mailto:<a href="mailto:pierre-yves.david@octobus.net" target="_blank">pierre-yves.david@octobus.net</a>><br>
> > <mailto:<a href="mailto:pierre-yves.david@octobus.net" target="_blank">pierre-yves.david@octobus.net</a><br>
> <mailto:<a href="mailto:pierre-yves.david@octobus.net" target="_blank">pierre-yves.david@octobus.net</a>>>><br>
> > # Date 1552263020 -3600<br>
> > # Mon Mar 11 01:10:20 2019 +0100<br>
> > # Node ID eac353183daaef0a503da8cd72b8df43f54d7fb8<br>
> > # Parent a753bc019c1ad7c5661a050adce49e4c3cd5a786<br>
> > # EXP-Topic fnodecache<br>
> > # Available At <a href="https://bitbucket.org/octobus/mercurial-devel/" rel="noreferrer" target="_blank">https://bitbucket.org/octobus/mercurial-devel/</a><br>
> > # hg pull<br>
> > <a href="https://bitbucket.org/octobus/mercurial-devel/" rel="noreferrer" target="_blank">https://bitbucket.org/octobus/mercurial-devel/</a> -r eac353183daa<br>
> > hgtagsfnodescache: inherit fnode from parent when possible<br>
> ><br>
> > If a changeset does not update the content of `.hgtags`, it<br>
> means it<br>
> > will use<br>
> > the same file-node (for `.hgtags`) than its parents. In this such<br>
> > case we can<br>
> > directly reuse the parent's file-node.<br>
> ><br>
> > We use this property when updating the `hgtagsfnodescache`<br>
> taking a<br>
> > faster path<br>
> > if we already have a cached value for the parent's of the node we<br>
> > are looking<br>
> > at.<br>
> ><br>
> > Doing so provide a large performance boost when looking at a<br>
> lot of<br>
> > fnode,<br>
> > especially on repository with very large manifest:<br>
> ><br>
> > timing for `tagsmod.fnoderevs(ui, repo, repo.changelog.revs())`<br>
> ><br>
> ><br>
> > What end-user command does this correspond to? `hg tags` with no<br>
> > .hg/cache/tags?<br>
> <br>
> hg debugupdatecache<br>
> <br>
> ><br>
> ><br>
> > mercurial: (41907 revisions, 1923 files)<br>
> ><br>
> > before: 6.9 seconds<br>
> > after: 2.7 seconds (-54%)<br>
> ><br>
> > pypy: (96266 revisions, 5198 files)<br>
> ><br>
> > before: 80 seconds<br>
> > after: 20 seconds (-75%)<br>
> ><br>
> > mozilla-central: (463411 revisions, 272080 files)<br>
> ><br>
> > before: 7166.4 seconds<br>
> > after: 47.8 seconds (-99%, x150 speedup)<br>
> ><br>
> ><br>
> > Nice improvements :) How did people work with these repos before?<br>
> <br>
> This is the timing for compute the information for all nodes. To<br>
> retrieve current tags name we only need this data for all heads.<br>
> <br>
> Getting it for all heads is still very slow to compute initially. (that<br>
> is why we exchange them during clone now).<br>
> <br>
> To illustrate the slowness, I started a tags computation from cold<br>
> cache… This was 3 hours ago…<br>
> <br>
> <br>
> So currently we only use (and exchange) entry for the repository heads.<br>
> However, the speedup rely on reusing data from the parent. So warming<br>
> all entries during a `hg debugupdatecache` turns out to be more<br>
> efficient (with the new code).<br>
> <br>
> I guess the next step from here is to warm all entry in all cases (not<br>
> just `hg debugupdatecache`) and efficiently exchange them over the wire.<br>
> <br>
> <br>
> <br>
> ><br>
> ><br>
> > On a copy of mozilla-try with about 35K heads ans 1.7M<br>
> changesets,<br>
> > this move<br>
> > the computation from many hours to a couple of minutes.<br>
> Making it more<br>
> > interresting to do a full warm up of this cache before computing<br>
> > tags (from a<br>
> > cold cache).<br>
> ><br>
> > There seems to be other performance low hanging fruits, like<br>
> avoid<br>
> > the used of<br>
> > changectx or a more revision centric logic. However, the new<br>
> code is<br>
> > fast enough<br>
> > for my needs right now.<br>
> ><br>
> > diff --git a/mercurial/tags.py b/mercurial/tags.py<br>
> > --- a/mercurial/tags.py<br>
> > +++ b/mercurial/tags.py<br>
> > @@ -18,6 +18,7 @@ from .node import (<br>
> > bin,<br>
> > hex,<br>
> > nullid,<br>
> > + nullrev,<br>
> > short,<br>
> > )<br>
> > from .i18n import _<br>
> > @@ -718,12 +719,33 @@ class hgtagsfnodescache(object):<br>
> > if not computemissing:<br>
> > return None<br>
> ><br>
> > - # Populate missing entry.<br>
> > - try:<br>
> > - fnode = ctx.filenode('.hgtags')<br>
> > - except error.LookupError:<br>
> > - # No .hgtags file on this revision.<br>
> > - fnode = nullid<br>
> > + fnode = None<br>
> > + cl = self._repo.changelog<br>
> > + p1rev, p2rev = cl._uncheckedparentrevs(rev)<br>
> > + p1node = cl.node(p1rev)<br>
> > + p1fnode = self.getfnode(p1node, computemissing=False)<br>
> > + if p2rev != nullrev:<br>
> > + # There is some no-merge changeset where p1 is<br>
> null and<br>
> > p2 is set<br>
> > + # Processing them are merge is just slower, but<br>
> still<br>
> > give a good<br>
> > + # result.<br>
> ><br>
> ><br>
> > I think you're thinking of file copies, see<br>
> ><br>
> <a href="https://www.mercurial-scm.org/repo/hg/file/fdbeacb9d456/mercurial/localrepo.py#l2348" rel="noreferrer" target="_blank">https://www.mercurial-scm.org/repo/hg/file/fdbeacb9d456/mercurial/localrepo.py#l2348</a><br>
> <br>
> I am lost here. were are iterating over the changelog and the manifest<br>
> here. This code deal with "malformed" changelog entry. Why are file<br>
> copies relevant here?<br>
> <br>
> <br>
> I don't think you're lost. I think I just misunderstood what this was <br>
> about. I was not aware that some repos have commits broken in that way. <br>
> Any idea how that happened?<br>
<br>
People using debugsetparent in most case (could be faulty extension code <br>
too). The data structure allows to represent it, so it happens.<br>
<br>
You can also have changeset with p1 == p2 ≠ nullrev.<br></blockquote><div><br></div><div>Yes, I think I've seen evolve do that in some cases.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
-- <br>
Pierre-Yves David<br>
</blockquote></div></div>