Bug 6445 - hg log command taking 115 seconds with large added css file
Summary: hg log command taking 115 seconds with large added css file
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: stable branch
Hardware: PC Linux
: normal bug
Assignee: Yuya Nishihara
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-27 16:40 UTC by Peter Sanchez
Modified: 2020-12-11 00:00 UTC (History)
2 users (show)

See Also:
Python Version: 3.8


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Sanchez 2020-11-27 16:40 UTC
hg version 5.6+8-81da6feb5000

We're adding a large css file (281k lines) to an existing project. It was added in a feature branch. Trying to run a log command, with preview, etc. on the specific revision where the file was added to the repo takes over 100 seconds.

Other team members with hg versions 5.6 (installed via pip) and 5.2.1 do not have this issue.

The revision included small changes to about 5-6 other text files (.py, .js, etc.)

I stepped through each file one by one and only the large css file had this issue.

Below is the profile data and debug install:


hg log -r 269 hyfm/static/css/tailwind.css -p --profile
...

| 100.0% 163.48s  dispatch.py:    run              line 43:  dispatch.run()
| 100.0% 163.48s  dispatch.py:    dispatch         line 113:  status = dispatch(req)
| 100.0% 163.48s  dispatch.py:    _runcatch        line 303:  ret = _runcatch(req) or 0
| 100.0% 163.48s  dispatch.py:    _callcatch       line 479:  return _callcatch(ui, _runc...
| 100.0% 163.48s  scmutil.py:     callcatch        line 488:  return scmutil.callcatch(ui...
| 100.0% 163.48s  dispatch.py:    _runcatchfunc    line 153:  return func()
| 100.0% 163.48s  dispatch.py:    _dispatch        line 469:  return _dispatch(req)
| 100.0% 163.47s  dispatch.py:    runcommand       line 1232:  return runcommand(
| 100.0% 163.47s  pager.py:       pagecmd          line 917:  ret = _runcommand(ui, optio...
| 100.0% 163.46s  dispatch.py:    _runcommand      line 76:  return orig(ui, options, cm...
| 100.0% 163.46s  dispatch.py:    <lambda>         line 1244:  return cmdfunc()
| 100.0% 163.46s  util.py:        check            line 1230:  d = lambda: util.checksigna...
| 100.0% 163.46s  commands.py:    log              line 1867:  return func(*args, **kwargs)
| 100.0% 163.45s  logcmdutil.py:  displayrevs      line 4609:  displayfn(ui, repo, revs, d...
 \ 64.9% 106.07s  logcmdutil.py:  show             line 1191:  displayer.show(ctx, copies=...
   | 64.9% 106.07s  logcmdutil.py:  _show          line 276:  self._show(ctx, copies, props)
   | 64.9% 106.07s  logcmdutil.py:  _showpatch     line 399:  self._showpatch(ctx, graphw...
   | 64.9% 106.07s  logcmdutil.py:  showdiff       line 431:  self._differ.showdiff(
   | 64.9% 106.07s  logcmdutil.py:  diffordiffstat line 215:  diffordiffstat(
   | 64.9% 106.07s  util.py:        filechunkiter  line 173:  for chunk in util.filechunk...
   | 64.9% 106.07s  util.py:        read           line 2695:  s = nbytes and f.read(nbytes)
   | 64.8% 105.92s  util.py:        splitbig       line 2635:  for chunk in self.iter:
   | 64.8% 105.87s  logcmdutil.py:  gen            line 2606:  for chunk in chunks:
   | 63.8% 104.30s  patch.py:       difflabel      line 170:  for chunk, label in chunks:
   | 63.6% 104.03s  patch.py:       consumehunkbufferline 2863:  for token in consumehunkbuf...
   | 63.6% 104.01s  patch.py:       diffsinglehunkinlineline 2832:  for token in dodiffhunk(hun...
 \ 35.1% 57.38s  logcmdutil.py:  flush             line 1192:  displayer.flush(ctx)
   | 35.1% 57.38s  ui.py:          write           line 265:  self.ui.write(self.hunk[rev])
   | 35.1% 57.38s  procutil.py:    write           line 1158:  dest.write(msg)
---
Sample count: 59297
Total time: 115.960000 seconds (163.480000 wall)

(e250a06) pjs:thinkpad ~/development/hyfm $ hg debuginstall
checking encoding (UTF-8)...
checking Python executable (/usr/bin/python)
checking Python implementation (CPython)
checking Python version (3.8.5)
checking Python lib (/usr/lib/python3.8)...
checking Python security support (sni,tls1.0,tls1.1,tls1.2)
checking Rust extensions (missing)
checking Mercurial version (5.6)
checking Mercurial custom build (8-81da6feb5000)
checking module policy (c)
checking installed modules (/usr/local/lib/python3.8/dist-packages/mercurial)...
checking registered compression engines (bz2, bz2truncated, none, zlib, zstd)
checking available compression engines (bz2, bz2truncated, none, zlib, zstd)
checking available compression engines for wire protocol (zstd, zlib, bz2, none)
checking "re2" regexp engine (missing)
checking templates (/usr/local/lib/python3.8/dist-packages/mercurial/templates)...
checking default template (/usr/local/lib/python3.8/dist-packages/mercurial/templates/map-cmdline.default)
checking commit editor... (vim)
checking username (Peter Sanchez <peter@netlandish.com>)
no problems detected
Comment 1 Peter Sanchez 2020-11-27 17:07 UTC
You can get a copy of tailwind.css file in question here:

https://s3.amazonaws.com/all-media/files/tailwind.css
Comment 2 Yuya Nishihara 2020-11-27 21:08 UTC
It's Python2 vs Python3 thing. Apparently, bytes concatenation is super slow on Python3, and word-diff is stuck with that.

% python3.9 -m timeit -s "s = b''" "for i in range(10000): s += b'line'"
5 loops, best of 5: 39.2 msec per loop

% python2.7 -m timeit -s "s = b''" "for i in range(10000): s += b'line'"
1000 loops, best of 3: 321 usec per loop
Comment 3 HG Bot 2020-12-03 19:10 UTC
Fixed by https://mercurial-scm.org/repo/hg/rev/210f9b8d7bbd
Yuya Nishihara <yuya@tcha.org>
diff: do not concatenate immutable bytes while building a/b bodies (issue6445)

Use bytearray instead. I don't know what's changed since Python 2, but bytes
concatenation is 100x slow on Python 3.

  % python2.7 -m timeit -s "s = b''" "for i in range(10000): s += b'line'"
  1000 loops, best of 3: 321 usec per loop

  % python3.9 -m timeit -s "s = b''" "for i in range(10000): s += b'line'"
  5 loops, best of 5: 39.2 msec per loop

Benchmark using tailwind.css (measuring the fast path, a is empty):

  % HGRCPATH=/dev/null python2.7 ./hg log -R /tmp/issue6445 -p --time \
  --color=always --config diff.word-diff=true >/dev/null
  (prev) time: real 1.580 secs (user 1.560+0.000 sys 0.020+0.000)
  (this) time: real 1.610 secs (user 1.570+0.000 sys 0.030+0.000)

  % HGRCPATH=/dev/null python3.9 ./hg log -R /tmp/issue6445 -p --time \
  --color=always --config diff.word-diff=true >/dev/null
  (prev) time: real 114.500 secs (user 114.460+0.000 sys 0.030+0.000)
  (this) time: real 2.180 secs (user 2.140+0.000 sys 0.040+0.000)

Benchmark using random tabular text data (not the fast path):

  % dd if=/dev/urandom bs=1k count=1000 | hexdump -v -e '16/1 "%3u," "\n"' > ttf
  % hg ci -ma
  % dd if=/dev/urandom bs=1k count=1000 | hexdump -v -e '16/1 "%3u," "\n"' > ttf
  % hg ci -mb

  % HGRCPATH=/dev/null python2.7 ./hg log -R /tmp/issue6445 -p --time \
  --color=always --config diff.word-diff=true >/dev/null
  (prev) time: real 3.240 secs (user 3.040+0.000 sys 0.200+0.000
  (this) time: real 3.230 secs (user 3.070+0.000 sys 0.160+0.000)

  % HGRCPATH=/dev/null python3.9 ./hg log -R /tmp/issue6445 -p --time \
  --color=always --config diff.word-diff=true >/dev/null
  (prev) time: real 44.130 secs (user 43.850+0.000 sys 0.270+0.000)
  (this) time: real 4.170 secs (user 3.850+0.000 sys 0.310+0.000)

(please test the fix)
Comment 4 Peter Sanchez 2020-12-03 20:40 UTC
I rebuilt from repo/hg stable branch which includes your fix. Here are the results.


(efd42eb|card-41) pjs:thinkpad ~/development/hyfm $ hg tip
changeset:   285:efd42eb008e1
branch:      card-41
tag:         tip
user:        Oscar Cortez <oscar@netlandish.com>
date:        Wed Dec 02 14:30:28 2020 -0600
summary:     Add support for delete objects via alpine.js - Refs #41


(efd42eb|card-41) pjs:thinkpad ~/development/hyfm $ hg log -r 269 --stat
changeset:   269:1ae659ae1dfd
branch:      card-41
parent:      263:df906978c521
user:        Oscar Cortez <oscar@netlandish.com>
date:        Thu Nov 26 14:04:22 2020 -0600
summary:     Add tailwind config - Refs #40

 .hgignore                      |       1 +
 hyfm/settings/base_settings.py |       8 +-
 hyfm/static/css/_tailwind.css  |       3 +
 hyfm/static/css/tailwind.css   |  281327 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 package-lock.json              |    1309 +
 package.json                   |      24 +
 tailwind.config.js             |      15 +
 7 files changed, 282685 insertions(+), 2 deletions(-)


hg log -r 269 hyfm/static/css/tailwind.css -p --profile
...
| 100.0%  4.58s  dispatch.py:    run               line 43:  dispatch.run()
| 100.0%  4.58s  dispatch.py:    dispatch          line 113:  status = dispatch(req)
| 100.0%  4.58s  dispatch.py:    _runcatch         line 303:  ret = _runcatch(req) or 0
| 100.0%  4.58s  dispatch.py:    _callcatch        line 479:  return _callcatch(ui, _runc...
| 100.0%  4.58s  scmutil.py:     callcatch         line 488:  return scmutil.callcatch(ui...
| 100.0%  4.58s  dispatch.py:    _runcatchfunc     line 153:  return func()
| 100.0%  4.58s  dispatch.py:    _dispatch         line 469:  return _dispatch(req)
| 99.8%  4.57s  dispatch.py:    runcommand         line 1232:  return runcommand(
| 99.8%  4.57s  pager.py:       pagecmd            line 917:  ret = _runcommand(ui, optio...
| 99.6%  4.56s  dispatch.py:    _runcommand        line 76:  return orig(ui, options, cm...
| 99.6%  4.56s  dispatch.py:    <lambda>           line 1244:  return cmdfunc()
| 99.6%  4.56s  util.py:        check              line 1230:  d = lambda: util.checksigna...
| 99.6%  4.56s  commands.py:    log                line 1867:  return func(*args, **kwargs)
| 99.1%  4.54s  logcmdutil.py:  displayrevs        line 4609:  displayfn(ui, repo, revs, d...
 \ 53.5%  2.45s  logcmdutil.py:  show              line 1191:  displayer.show(ctx, copies=...
   | 53.5%  2.45s  logcmdutil.py:  _show           line 276:  self._show(ctx, copies, props)
   | 53.5%  2.45s  logcmdutil.py:  _showpatch      line 399:  self._showpatch(ctx, graphw...
   | 53.5%  2.45s  logcmdutil.py:  showdiff        line 431:  self._differ.showdiff(
   | 53.5%  2.45s  logcmdutil.py:  diffordiffstat  line 215:  diffordiffstat(
   | 53.5%  2.45s  util.py:        filechunkiter   line 173:  for chunk in util.filechunk...
   | 53.5%  2.45s  util.py:        read            line 2695:  s = nbytes and f.read(nbytes)
   | 49.1%  2.25s  util.py:        splitbig        line 2635:  for chunk in self.iter:
   | 46.7%  2.14s  logcmdutil.py:  gen             line 2606:  for chunk in chunks:
     \ 29.9%  1.37s  ui.py:          label         line 171:  yield ui.label(chunk, label...
       | 26.6%  1.22s  color.py:       colorlabel  line 2047:  return color.colorlabel(sel...
       | 22.3%  1.02s  color.py:       <listcomp>  line 429:  [
       | 21.2%  0.97s  color.py:       _render_effectsline 430:  _render_effects(ui, line, e...
       | 12.0%  0.55s  color.py:       <listcomp>  line 393:  start = [
       | 10.0%  0.46s  pycompat.py:    __new__     line 394:  pycompat.bytestr(activeeffe...
     \ 15.1%  0.69s  patch.py:       difflabel     line 170:  for chunk, label in chunks:
       |  8.3%  0.38s  patch.py:       consumehunkbufferline 2863:  for token in consumehunkbuf...
       |  8.1%  0.37s  patch.py:       diffsinglehunkinlineline 2832:  for token in dodiffhunk(hun...
       |  7.0%  0.32s  patch.py:       diffsinglehunkline 2745:  for t in diffsinglehunk(hun...
 \ 45.6%  2.09s  logcmdutil.py:  flush             line 1192:  displayer.flush(ctx)
   | 45.6%  2.09s  ui.py:          write           line 265:  self.ui.write(self.hunk[rev])
   | 45.6%  2.09s  procutil.py:    write           line 1158:  dest.write(msg)
---
Sample count: 2063
Total time: 2.820000 seconds (4.580000 wall)

(efd42eb|card-41) pjs:thinkpad ~/development/hyfm $ hg debuginstall
checking encoding (UTF-8)...
checking Python executable (/usr/bin/python)
checking Python implementation (CPython)
checking Python version (3.8.5)
checking Python lib (/usr/lib/python3.8)...
checking Python security support (sni,tls1.0,tls1.1,tls1.2)
checking Rust extensions (missing)
checking Mercurial version (5.6.1)
checking Mercurial custom build (2-dadbd01f9393)
checking module policy (c)
checking installed modules (/usr/local/lib/python3.8/dist-packages/mercurial)...
checking registered compression engines (bz2, bz2truncated, none, zlib, zstd)
checking available compression engines (bz2, bz2truncated, none, zlib, zstd)
checking available compression engines for wire protocol (zstd, zlib, bz2, none)
checking "re2" regexp engine (missing)
checking templates (/usr/local/lib/python3.8/dist-packages/mercurial/templates)...
checking default template (/usr/local/lib/python3.8/dist-packages/mercurial/templates/map-cmdline.default)
checking commit editor... (vim)
checking username (Peter Sanchez <peter@netlandish.com>)
no problems detected

So clearly much faster. I had a team mate try though and his time was 2 seconds faster than mine. Here's his run:


λ hg tip
changeset:   285:4ce87ba26ae8
branch:      card-41
tag:         tip
user:        Oscar Cortez <oscar@netlandish.com>
date:        Wed Dec 02 15:43:12 2020 -0600
summary:     Update data request template to use tailwind - Refs #41


λ hg log -r 1ae659ae1dfd --stat
changeset:   264:1ae659ae1dfd
branch:      card-41
user:        Oscar Cortez <oscar@netlandish.com>
date:        Thu Nov 26 14:04:22 2020 -0600
summary:     Add tailwind config - Refs #40
 .hgignore                      |       1 +
 hyfm/settings/base_settings.py |       8 +-
 hyfm/static/css/_tailwind.css  |       3 +
 hyfm/static/css/tailwind.css   |  281327 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 package-lock.json              |    1309 +
 package.json                   |      24 +
 tailwind.config.js             |      15 +
 7 files changed, 282685 insertions(+), 2 deletions(-)


λ hg log -r 1ae659ae1dfd hyfm/static/css/tailwind.css -p --profile
...
| 100.0%  0.21s  dispatch.py:    run               line 43:  dispatch.run()
| 100.0%  0.21s  dispatch.py:    dispatch          line 113:  status = dispatch(req)
| 100.0%  0.21s  dispatch.py:    _runcatch         line 303:  ret = _runcatch(req) or 0
| 100.0%  0.21s  dispatch.py:    _callcatch        line 479:  return _callcatch(ui, _runc...
| 100.0%  0.21s  scmutil.py:     callcatch         line 488:  return scmutil.callcatch(ui...
| 100.0%  0.21s  dispatch.py:    _runcatchfunc     line 153:  return func()
| 100.0%  0.21s  dispatch.py:    _dispatch         line 469:  return _dispatch(req)
| 85.7%  0.18s  dispatch.py:    runcommand         line 1232:  return runcommand(
| 85.7%  0.18s  dispatch.py:    _runcommand        line 917:  ret = _runcommand(ui, optio...
| 85.7%  0.18s  dispatch.py:    <lambda>           line 1244:  return cmdfunc()
| 85.7%  0.18s  util.py:        check              line 1230:  d = lambda: util.checksigna...
| 85.7%  0.18s  commands.py:    log                line 1867:  return func(*args, **kwargs)
 \ 71.4%  0.15s  logcmdutil.py:  displayrevs       line 4609:  displayfn(ui, repo, revs, d...
   | 71.4%  0.15s  logcmdutil.py:  show            line 1191:  displayer.show(ctx, copies=...
   | 71.4%  0.15s  logcmdutil.py:  _show           line 276:  self._show(ctx, copies, props)
   | 66.7%  0.14s  logcmdutil.py:  _showpatch      line 399:  self._showpatch(ctx, graphw...
   | 66.7%  0.14s  logcmdutil.py:  showdiff        line 431:  self._differ.showdiff(
   | 66.7%  0.14s  logcmdutil.py:  diffordiffstat  line 215:  diffordiffstat(
   | 66.7%  0.14s  util.py:        filechunkiter   line 158:  for chunk in util.filechunk...
   | 66.7%  0.14s  util.py:        read            line 2695:  s = nbytes and f.read(nbytes)
   | 57.1%  0.12s  util.py:        splitbig        line 2635:  for chunk in self.iter:
   | 57.1%  0.12s  patch.py:       diff            line 2606:  for chunk in chunks:
     \ 47.6%  0.10s  patch.py:       diffhunks     line 2541:  for fctx1, fctx2, hdr, hunk...
       \ 28.6%  0.06s  patch.py:       diffcontent line 3078:  yield diffcontent(data1, da...
         | 28.6%  0.06s  mdiff.py:       unidiff   line 3124:  uheaders, hunks = mdiff.uni...
         | 14.3%  0.03s  mdiff.py:       <listcomp>line 308:  hunklines = [b"@@ -0,0 +1,%...
       \ 14.3%  0.03s  patch.py:       <genexpr>   line 2997:  binary = any(f.isbinary() f...
         | 14.3%  0.03s  context.py:     isbinary  line 2997:  binary = any(f.isbinary() f...
         | 14.3%  0.03s  context.py:     data      line 945:  return stringutil.binary(se...
         | 14.3%  0.03s  filelog.py:     read      line 1301:  return self._filelog.read(s...
         | 14.3%  0.03s  filelog.py:     revision  line 169:  return storageutil.filterme...
         | 14.3%  0.03s  revlog.py:      revision  line 93:  return self._revlog.revisio...
         | 14.3%  0.03s  revlog.py:      _revisiondataline 1839:  return self._revisiondata(n...
         |  9.5%  0.02s  revlog.py:      checkhash line 1894:  self.checkhash(text, node, ...
         |  9.5%  0.02s  revlog.py:      hash      line 1967:  if node != self.hash(text, ...
         |  9.5%  0.02s  storageutil.py: hashrevisionsha1line 1956:  return storageutil.hashrevi...
     \  9.5%  0.02s  patch.py:       <genexpr>     line 2561:  text = b''.join(b''.join(hl...
 \  9.5%  0.02s  util.py:        __getattribute__  line 4586:  revs, differ = logcmdutil.g...
---
Sample count: 65
Total time: 0.210000 seconds (0.210000 wall)


λ hg debuginstall
checking encoding (UTF-8)...
checking Python executable (/usr/local/opt/python@3.9/bin/python3.9)
checking Python implementation (CPython)
checking Python version (3.9.0)
checking Python lib (/usr/local/Cellar/python@3.9/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9)...
checking Python security support (sni,tls1.0,tls1.1,tls1.2)
checking Rust extensions (missing)
checking Mercurial version (5.6)
checking Mercurial custom build ()
checking module policy (c)
checking installed modules (/usr/local/Cellar/mercurial/5.6/lib/python3.9/site-packages/mercurial)...
checking registered compression engines (bz2, bz2truncated, none, zlib, zstd)
checking available compression engines (bz2, bz2truncated, none, zlib, zstd)
checking available compression engines for wire protocol (zstd, zlib, bz2, none)
checking "re2" regexp engine (missing)
checking templates (/usr/local/Cellar/mercurial/5.6/lib/python3.9/site-packages/mercurial/templates)...
checking default template (/usr/local/Cellar/mercurial/5.6/lib/python3.9/site-packages/mercurial/templates/map-cmdline.default)
checking commit editor... (vim)
checking username (Oscar Cortez <oscar@netlandish.com>)
no problems detected

Notice also his Sample Count is much smaller. I don't know the internals of hg very well so not sure why that would be different here.

In any case, it's much faster. Not sure if there still isn't something odd happening but hey, progress :)
Comment 5 Bugzilla 2020-12-11 00:00 UTC
Bug was set to TESTING for 7 days, resolving