Bug 5976 - Multiple performance regression on perfchangegroupchangelog in 4.7
Summary: Multiple performance regression on perfchangegroupchangelog in 4.7
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: default branch
Hardware: All All
: normal bug
Assignee: Bugzilla
URL:
Keywords: perfregression
Depends on:
Blocks:
 
Reported: 2018-08-31 10:15 UTC by Boris Feld
Modified: 2018-10-07 00:00 UTC (History)
2 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Boris Feld 2018-08-31 10:15 UTC
Our performance test suite worker detected a regression in perfchangegroupchangelog (http://perf.octobus.net/#regressions?sort=3&dir=desc):

others.PerfTestSuite.track_changegroupchangelog('netbeans-2018-08-01')  2018-08-09 18:28    913ca175..b9f94d67, d99468d2..e793e11e  1.75x   8.75    15.3    
others.PerfTestSuite.track_changegroupchangelog('pypy-2018-08-01')  2018-08-09 18:28    913ca175..b9f94d67, d99468d2..e793e11e  1.66x   2.75    4.58    
others.PerfTestSuite.track_changegroupchangelog('mercurial-2018-08-01') 2018-08-09 18:28    913ca175..b9f94d67, d99468d2..e793e11e  1.66x   1.08    1.79    
others.PerfTestSuite.track_changegroupchangelog('mozilla-central-2018-08-01')   2018-08-09 02:03    913ca175..b9f94d67  1.15x   12.5    14.4    

The bisect tool identified two responsible commits:
 - https://www.mercurial-scm.org/repo/hg/rev/23d582ca
 - https://www.mercurial-scm.org/repo/hg/rev/e793e11e
Comment 1 Gregory Szorc 2018-09-05 13:03 UTC
This regression is somewhat expected (the commit message noted the overhead).

I will try to claw this back before 4.8 is released.

FWIW the regression is proportional to the number of revisions being added to a changegroup. This disproportionately affects changelog revisions because changelog revisions are small and any overhead in changegroup generation code is felt more significantly. The overhead is still there for manifest and file revisions. But it's percentage of execution time is lower because generation of this data is often dwarfed by data retrieval and diffing.
Comment 2 Boris Feld 2018-09-07 10:03 UTC
Here are the numbers we could gather on a private repo with ~1M changesets:

# Perfchangegroupchangelog with 4.6.2

HGRCPATH= hg --config profiling.time-track=real --config extensions.perf=.../contrib/perf.py perfchangegroupchangelog --profile
! wall 110.213339 comb 114.640000 user 108.800000 sys 5.840000 (best of 3)
| 99.9%  dispatch.py:    _callcatch                line 41:  dispatch.run()
| 99.9%  scmutil.py:     callcatch                 line 359:  return scmutil.callcatch(ui...
| 99.9%  dispatch.py:    _runcatchfunc             line 160:  return func()
| 99.9%  hg.py:          repository                line 341:  return _dispatch(req)
| 99.3%  perf.py:        _timer                    line 642:  timer(d)
| 99.3%  perf.py:        d                         line 244:  r = func()
| 99.2%  changegroup.py: group                     line 638:  for chunk in bundler.group(...
 \ 57.1%  perf.py:        lookup                   line 581:  linknode = lookup(revlog.no...
   | 56.8%  changelog.py:   read                   line 634:  cl.read(node)
   | 55.9%  changelog.py:   __new__                line 486:  c = changelogrevision(self....
     \ 47.8%  revlog.py:      _chunks              line 1787:  bins = self._chunks(chain, ...
       | 44.6%  revlog.py:      decompress         line 1706:  ladd(decomp(buffer(data, ch...
     \  5.7%  revlog.py:      checkhash            line 1802:  self.checkhash(text, node, ...
       |  5.3%  revlog.py:      hash               line 1885:  if node != self.hash(text, ...
       |  5.3%  revlog.py:      hash               line 1812:  return hash(text, p1, p2)
 \ 39.7%  changegroup.py: revchunk                 line 582:  for c in self.revchunk(revl...
   | 35.1%  revlog.py:      revdiff                line 808:  delta = revlog.revdiff(base...
   | 34.3%  revlog.py:      _chunk                 line 1731:  return bytes(self._chunk(re...
   | 34.0%  revlog.py:      decompress             line 1655:  return self.decompress(self...
---
Sample count: 264912
Total time: 346.570000 seconds

# Perfchangegroupchangelog with 6268fed317d0

HGRCPATH= hg --config profiling.time-track=real --config extensions.perf=.../contrib/perf.py perfchangegroupchangelog --profile
! wall 150.886031 comb 156.070000 user 148.570000 sys 7.500000 (best of 3)
| 100.0%  dispatch.py:    _callcatch               line 41:  dispatch.run()
| 100.0%  scmutil.py:     callcatch                line 367:  return scmutil.callcatch(ui...
| 100.0%  dispatch.py:    _runcatchfunc            line 164:  return func()
| 100.0%  extensions.py:  loadall                  line 350:  return _dispatch(req)
| 99.3%  perf.py:        _timer                    line 678:  timer(d)
| 99.2%  perf.py:        d                         line 269:  r = func()
| 99.2%  changegroup.py: deltagroup                line 671:  for chunk in chunks:
 \ 65.5%  changegroup.py: lookupcl                 line 708:  linknode = lookup(node)
   | 65.2%  changelog.py:   changelogrevision      line 955:  c = cl.changelogrevision(x)
   | 65.2%  changelog.py:   __new__                line 491:  return changelogrevision(se...
   | 59.6%  revlog.py:      _chunks                line 1623:  bins = self._chunks(chain, ...
   | 56.4%  revlog.py:      _getsegmentforrevs     line 1518:  ladd(decomp(buffer(data, ch...
 \ 31.2%  revlog.py:      emitrevisiondeltas       line 749:  for i, delta in enumerate(s...
   \ 16.3%  revlog.py:      revision               line 2335:  delta = self.revdiff(basere...
     | 16.1%  revlog.py:      _chunk               line 1562:  return bytes(self._chunk(re...
     | 16.1%  revlog.py:      _getsegmentforrevs   line 1466:  return self.decompress(self...
   \ 12.5%  revlog.py:      revision               line 2333:  revision = self.revision(no...
     | 10.9%  revlog.py:      _chunks              line 1623:  bins = self._chunks(chain, ...
     | 10.0%  revlog.py:      _getsegmentforrevs   line 1518:  ladd(decomp(buffer(data, ch...
---
Sample count: 323902
Total time: 474.410000 seconds (459.620000 wall)

# Bundling 1000 changesets with 4.6.2

HGRCPATH= hg --config profiling.time-track=real --config progress.disable=True --profile bundle -t none-v2 --base "all() - last(all(), 1000)" test.bundle > /dev/null
| 100.0%  dispatch.py:    _callcatch               line 41:  dispatch.run()
| 100.0%  scmutil.py:     callcatch                line 359:  return scmutil.callcatch(ui...
| 100.0%  dispatch.py:    _runcatchfunc            line 160:  return func()
| 100.0%  hg.py:          repository               line 341:  return _dispatch(req)
 \ 49.8%  bundle2.py:     writenewbundle           line 1278:  compopts=compopts)
   | 49.4%  changegroup.py: writechunks            line 1574:  return changegroup.writechu...
   | 48.2%  bundle2.py:     getchunks              line 86:  for c in chunks:
   | 48.2%  bundle2.py:     _getcorechunk          line 673:  self._compopts):
   | 48.2%  bundle2.py:     getchunks              line 694:  for chunk in part.getchunks...
   | 48.2%  bundle2.py:     _payloadchunks         line 1087:  for chunk in self._payloadc...
   | 48.1%  util.py:        read                   line 1131:  chunk = buff.read(preferedc...
   | 48.1%  util.py:        splitbig               line 2235:  for chunk in self.iter:
   | 48.0%  changegroup.py: getchunks              line 2207:  for chunk in chunks:
   | 48.0%  changegroup.py: getchunk               line 219:  chunk = getchunk(self)
     \ 41.4%  util.py:        readexactly          line 49:  d = readexactly(stream, 4)
       | 41.4%  changegroup.py: read               line 3711:  s = stream.read(n)
       | 41.4%  util.py:        read               line 142:  return self._stream.read(l)
       | 41.3%  util.py:        splitbig           line 2235:  for chunk in self.iter:
       | 41.3%  changegroup.py: generate           line 2207:  for chunk in chunks:
         \ 25.4%  changegroup.py: generatefiles    line 675:  source):
           | 21.3%  localrepo.py:   file           line 759:  filerevlog = repo.file(fname)
           | 21.3%  repoview.py:    __getattr__    line 1115:  return filelog.filelog(self...
           | 21.3%  revlog.py:      __init__       line 24:  censorable=True)
           | 14.9%  revlog.py:      _indexfp       line 663:  with self._indexfp() as f:
           | 14.9%  store.py:       __call__       line 730:  return self.opener(self.ind...
           | 14.9%  vfs.py:         __call__       line 495:  return self.vfs(self.encode...
         \ 15.6%  changegroup.py: generatemanifestsline 658:  fastpathlinkrev, mfs, fnode...
           | 15.2%  changegroup.py: _packmanifests line 743:  makelookupmflinknode(dir, n...
           | 15.2%  changegroup.py: group          line 598:  lookuplinknode, units=_('ma...
           | 14.8%  changegroup.py: revchunk       line 582:  for c in self.revchunk(revl...
           | 14.8%  revlog.py:      revdiff        line 808:  delta = revlog.revdiff(base...
           | 11.7%  revlog.py:      revision       line 1733:  return mdiff.textdiff(self....
           | 11.1%  revlog.py:      _chunks        line 1787:  bins = self._chunks(chain, ...
           |  6.2%  revlog.py:      _getsegmentforrevsline 1694:  offset, data = self._getseg...
           |  6.2%  revlog.py:      _getsegment    line 1644:  return start, self._getsegm...
           |  6.2%  revlog.py:      _readsegment   line 1610:  return self._readsegment(of...
     \  6.6%  util.py:        readexactly          line 55:  return readexactly(stream, ...
       |  6.6%  changegroup.py: read               line 3711:  s = stream.read(n)
       |  6.5%  util.py:        read               line 142:  return self._stream.read(l)
       |  6.5%  util.py:        splitbig           line 2235:  for chunk in self.iter:
       |  6.5%  changegroup.py: generate           line 2207:  for chunk in chunks:
 \ 39.6%  context.py:     node                     line 1228:  common = [repo[rev].node() ...
   | 27.9%  context.py:     __init__               line 839:  return context.changectx(se...
   | 20.3%  repoview.py:    changelog              line 415:  self._node = repo.changelog...
   |  5.8%  localrepo.py:   __get__                line 216:  unfichangelog = unfi.changelog
 \  7.6%  util.py:        __get__                  line 1243:  if not outgoing.missing:
   |  7.6%  discovery.py:   missing                line 1437:  result = self.func(obj)
   |  7.6%  discovery.py:   _computecommonmissing  line 122:  self._computecommonmissing()
   |  7.6%  revlog.py:      findcommonmissing      line 110:  self.missingheads)
   |  5.4%  changelog.py:   rev                    line 1033:  common = [self.rev(n) for n...
---
Sample count: 16277
Total time: 39.820000 seconds

# Bundling 1000 changesets with 6268fed317d0

| 100.0%  dispatch.py:    _callcatch               line 41:  dispatch.run()
| 100.0%  scmutil.py:     callcatch                line 367:  return scmutil.callcatch(ui...
| 100.0%  dispatch.py:    _runcatchfunc            line 164:  return func()
| 100.0%  commands.py:    bundle                   line 350:  return _dispatch(req)
 \ 62.8%  bundle2.py:     writenewbundle           line 1286:  compopts=compopts)
   | 62.5%  changegroup.py: writechunks            line 1575:  return changegroup.writechu...
   | 61.6%  bundle2.py:     getchunks              line 93:  for c in chunks:
   | 61.6%  bundle2.py:     _getcorechunk          line 674:  self._compopts):
   | 61.6%  bundle2.py:     getchunks              line 695:  for chunk in part.getchunks...
   | 61.6%  bundle2.py:     _payloadchunks         line 1088:  for chunk in self._payloadc...
   | 61.0%  util.py:        read                   line 1132:  chunk = buff.read(preferedc...
   | 61.0%  util.py:        splitbig               line 2214:  for chunk in self.iter:
   | 61.0%  changegroup.py: getchunks              line 2186:  for chunk in chunks:
   | 61.0%  changegroup.py: getchunk               line 226:  chunk = getchunk(self)
     \ 52.4%  util.py:        readexactly          line 52:  d = readexactly(stream, 4)
       | 52.4%  changegroup.py: read               line 3802:  s = stream.read(n)
       | 52.4%  util.py:        read               line 149:  return self._stream.read(l)
       | 52.3%  util.py:        splitbig           line 2214:  for chunk in self.iter:
       | 52.3%  changegroup.py: generate           line 2186:  for chunk in chunks:
         \ 37.4%  changegroup.py: generatefiles    line 916:  for path, deltas in it:
           | 35.1%  localrepo.py:   file           line 1144:  filerevlog = repo.file(fname)
           | 35.0%  repoview.py:    __getattr__    line 1143:  return filelog.filelog(self...
           | 35.0%  revlog.py:      __init__       line 24:  censorable=True)
           | 17.7%  revlog.py:      _indexfp       line 446:  with self._indexfp() as f:
           | 17.7%  store.py:       __call__       line 513:  return self.opener(self.ind...
           | 17.7%  vfs.py:         __call__       line 507:  return self.vfs(encoded, mo...
         \ 14.1%  changegroup.py: deltagroup       line 891:  for delta in deltas:
           | 13.9%  revlog.py:      emitrevisiondeltasline 749:  for i, delta in enumerate(s...
           | 13.9%  revlog.py:      revdiff        line 2335:  delta = self.revdiff(basere...
           | 10.9%  revlog.py:      revision       line 1564:  return mdiff.textdiff(self....
           | 10.4%  revlog.py:      _chunks        line 1623:  bins = self._chunks(chain, ...
             \  5.3%  revlog.py:      decompress   line 1518:  ladd(decomp(buffer(data, ch...
             \  5.1%  revlog.py:      _getsegmentforrevsline 1506:  offset, data = self._getseg...
               |  5.1%  revlog.py:      _getsegmentline 1455:  return start, self._getsegm...
               |  5.1%  revlog.py:      _readsegmentline 1421:  return self._readsegment(of...
     \  8.6%  util.py:        readexactly          line 58:  return readexactly(stream, ...
       |  8.6%  changegroup.py: read               line 3802:  s = stream.read(n)
       |  8.6%  util.py:        read               line 149:  return self._stream.read(l)
       |  8.5%  util.py:        splitbig           line 2214:  for chunk in self.iter:
       |  8.5%  changegroup.py: generate           line 2186:  for chunk in chunks:
 \ 28.5%  localrepo.py:   __getitem__              line 1236:  common = [repo[rev].node() ...
   | 22.5%  context.py:     __init__               line 870:  return context.changectx(se...
   | 18.8%  repoview.py:    changelog              line 395:  self._node = repo.changelog...
   |  7.1%  localrepo.py:   __get__                line 213:  unfichangelog = unfi.changelog
 \  5.9%  util.py:        __get__                  line 1251:  if not outgoing.missing:
   |  5.9%  discovery.py:   missing                line 1416:  result = self.func(obj)
   |  5.9%  discovery.py:   _computecommonmissing  line 122:  self._computecommonmissing()
   |  5.9%  revlog.py:      findcommonmissing      line 110:  self.missingheads)
---
Sample count: 21350
Total time: 40.520000 seconds (39.640000 wall)

Unfortunately I can't paste the raw output of `hg debugrevlog -c`. Here is a redacted version of the revision size information:

uncompressed data size (min/max/avg) : ~100 / ~5M / ~5K

If you need more information from debugrevlog, please tell me. I will see if I can paste a redacted version of it
Comment 3 Gregory Szorc 2018-09-21 21:46 UTC
I have a patch series deep in my unpublished queue that will claw back the perf. It does so by replacing emitrevisiondeltas() with a replacement storage layer API that doesn't require the (as-it-turns-out moderately expensive) process of building a list of request objects: you just feed it nodes and parameters to influence delta generation and it does its thing.

On my mozilla-unified clone:

$ hg perfchangegroupchangelog
4.7:   ! wall 15.824165 comb 15.830000 user 15.760000 sys 0.070000 (best of 3)
@:     ! wall 24.348077 comb 24.330000 user 24.140000 sys 0.190000 (best of 3)
patch: ! wall 18.245911 comb 18.240000 user 18.100000 sys 0.140000 (best of 3)

The remaining overhead versus 4.7 is probably due to 23d582ca. I'm not sure if we can recapture that perf while supporting generic storage backends.
Comment 4 HG Bot 2018-09-29 00:05 UTC
Fixed by https://mercurial-scm.org/repo/hg/rev/31b7e8e7132e
Gregory Szorc <gregory.szorc@gmail.com>
changegroup: port to emitrevisions() (issue5976)

We now have a unified API for emitting revision data from a storage
backend. It handles sorting nodes and the complicated delta versus
revision decisions for us.

This commit ports changegroup to that API.

There should be no behavior changes for changegroups not using
ellipsis. And lack of test changes seems to confirm that.

There are some changes for ellipsis mode, however.

Before, when sending an ellipsis revision, we would always send a
fulltext revision (as opposed to a delta). There was a TODO tracking
this open item.

One of the things the emitrevisions() API does for us is figure out
whether we can safely emit a delta. So, it is now possible for
ellipsis revisions to be sent as deltas! (It does this by not
assuming parent/ancestor revisions are available and tracking which
revisions have been sent out.)

Because we eliminated the list of revision delta request objects,
performance has improved substantially:

$ hg perfchangegroupchangelog
before: ! wall 24.348077 comb 24.330000 user 24.140000 sys 0.190000 (best of 3)
after:  ! wall 18.245911 comb 18.240000 user 18.100000 sys 0.140000 (best of 3)

That's a lot of overhead for creating a few hundred thousand Python
objects!

This is still a little slower than 4.7. Probably due to 23d582ca
introducing a type for the revision/delta results. There is
potentially room to optimize. But at some point we need to abstract
storage in order to support alternate storage backends. Unfortunately
that means using a Python data structure to represent results. And
unfortunately there is overhead with every new Python object created.

Differential Revision: https://phab.mercurial-scm.org/D4725

(please test the fix)
Comment 5 Bugzilla 2018-10-07 00:00 UTC
Bug was set to TESTING for 7 days, resolving