Bug 5933 - Errors updating to branch when cloning with keywords extension on Windows
Summary: Errors updating to branch when cloning with keywords extension on Windows
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: 4.5.3
Hardware: PC Windows
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-03 13:14 UTC by David Mastronarde
Modified: 2018-07-27 00:00 UTC (History)
2 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Mastronarde 2018-07-03 13:14 UTC
Cloning my repository with the -r option and a branch tag fails with a TypeError:
 hg clone -r IMOD_4-9 http://bio3d.colorado.edu/imod/nightlyBuilds/IMOD IMOD-hgtest
adding changesets
adding manifests
adding file changes
added 31946 changesets with 48532 changes to 5434 files
new changesets 14fb4137f74b:8e6f39ae0115
updating to branch IMOD_4-9
Exception in thread Thread-8:
Traceback (most recent call last):
  File "threading.pyc", line 801, in __bootstrap_inner
  File "mercurial\worker.pyc", line 231, in run
  File "mercurial\merge.pyc", line 1380, in batchget
  File "mercurial\context.pyc", line 1282, in data
  File "hgext\keyword.pyc", line 368, in read
  File "hgext\keyword.pyc", line 273, in expand
  File "hgext\keyword.pyc", line 263, in substitute
  File "hgext\keyword.pyc", line 260, in kwsub
  File "mercurial\cmdutil.pyc", line 1695, in show
  File "mercurial\cmdutil.pyc", line 2007, in _show
  File "mercurial\templatefilters.pyc", line 372, in stringify
  File "mercurial\util.pyc", line 1017, in increasingchunks
  File "mercurial\templater.pyc", line 1272, in _flatten
  File "mercurial\templater.pyc", line 413, in runtemplate
  File "mercurial\templater.pyc", line 311, in evalrawexp
  File "mercurial\templater.pyc", line 429, in runfilter
  File "mercurial\templater.pyc", line 315, in evalfuncarg
  File "mercurial\templater.pyc", line 311, in evalrawexp
  File "mercurial\templater.pyc", line 404, in runsymbol
  File "mercurial\templatekw.pyc", line 355, in showauthor
  File "mercurial\context.pyc", line 619, in user
  File "mercurial\util.pyc", line 933, in __get__
  File "mercurial\context.pyc", line 583, in _changeset
  File "mercurial\changelog.pyc", line 481, in changelogrevision
  File "mercurial\revlog.pyc", line 1723, in revision
TypeError: 'NoneType' object has no attribute '__getitem__'

** unknown exception encountered, please report by visiting
** https://mercurial-scm.org/wiki/BugTracker
** Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)]
** Mercurial Distributed SCM (version 4.5.3)
** Extensions loaded: fetch, extdiff, hgk, keyword, transplant
Traceback (most recent call last):
  File "hg", line 41, in <module>
  File "mercurial\dispatch.pyc", line 88, in run
  File "mercurial\dispatch.pyc", line 183, in dispatch
  File "mercurial\dispatch.pyc", line 324, in _runcatch
  File "mercurial\dispatch.pyc", line 332, in _callcatch
  File "mercurial\scmutil.pyc", line 154, in callcatch
  File "mercurial\dispatch.pyc", line 314, in _runcatchfunc
  File "mercurial\dispatch.pyc", line 918, in _dispatch
  File "mercurial\dispatch.pyc", line 673, in runcommand
  File "mercurial\dispatch.pyc", line 926, in _runcommand
  File "mercurial\dispatch.pyc", line 915, in <lambda>
  File "mercurial\util.pyc", line 1195, in check
  File "mercurial\commands.pyc", line 1449, in clone
  File "mercurial\hg.pyc", line 716, in clone
  File "mercurial\hg.pyc", line 745, in update
  File "mercurial\hg.pyc", line 741, in updaterepo
  File "mercurial\merge.pyc", line 2013, in update
  File "mercurial\merge.pyc", line 1492, in applyupdates
  File "mercurial\worker.pyc", line 283, in _windowsworker
TypeError: 'NoneType' object has no attribute '__getitem__'

while cloning without -r gives an IndexError:

requesting all changes
adding changesets
adding manifests
adding file changes
added 34792 changesets with 52210 changes to 5614 files
new changesets 14fb4137f74b:bdefdee51641
updating to branch default
Exception in thread Thread-6:
Traceback (most recent call last):
  File "threading.pyc", line 801, in __bootstrap_inner
  File "mercurial\worker.pyc", line 231, in run
  File "mercurial\merge.pyc", line 1380, in batchget
  File "mercurial\context.pyc", line 1282, in data
  File "hgext\keyword.pyc", line 368, in read
  File "hgext\keyword.pyc", line 273, in expand
  File "hgext\keyword.pyc", line 263, in substitute
  File "hgext\keyword.pyc", line 261, in kwsub
  File "mercurial\ui.pyc", line 875, in popbuffer
IndexError: list index out of range

** unknown exception encountered, please report by visiting
** https://mercurial-scm.org/wiki/BugTracker
** Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)]
** Mercurial Distributed SCM (version 4.5.3)
** Extensions loaded: fetch, extdiff, hgk, keyword, transplant
Traceback (most recent call last):
  File "hg", line 41, in <module>
  File "mercurial\dispatch.pyc", line 88, in run
  File "mercurial\dispatch.pyc", line 183, in dispatch
  File "mercurial\dispatch.pyc", line 324, in _runcatch
  File "mercurial\dispatch.pyc", line 332, in _callcatch
  File "mercurial\scmutil.pyc", line 154, in callcatch
  File "mercurial\dispatch.pyc", line 314, in _runcatchfunc
  File "mercurial\dispatch.pyc", line 918, in _dispatch
  File "mercurial\dispatch.pyc", line 673, in runcommand
  File "mercurial\dispatch.pyc", line 926, in _runcommand
  File "mercurial\dispatch.pyc", line 915, in <lambda>
  File "mercurial\util.pyc", line 1195, in check
  File "mercurial\commands.pyc", line 1449, in clone
  File "mercurial\hg.pyc", line 716, in clone
  File "mercurial\hg.pyc", line 745, in update
  File "mercurial\hg.pyc", line 741, in updaterepo
  File "mercurial\merge.pyc", line 2013, in update
  File "mercurial\merge.pyc", line 1492, in applyupdates
  File "mercurial\worker.pyc", line 283, in _windowsworker
IndexError: list index out of range

This works with various lower versions.  The highest mrecurial I am running on other machines is 4.3.3 on Windows 10.
Comment 1 Yuya Nishihara 2018-07-05 10:01 UTC
Can you test without the keyword extension?

I suspect there would be a data race since the windows worker is threaded,
which is inherently unsafe for monkey patching.

I think the windows worker should be enabled only when we're sure there's
no racy update on repo, ui, etc.
Comment 2 David Mastronarde 2018-07-05 10:16 UTC
Yes, it works with the keyword extension commented out in both cases.
Comment 3 Yuya Nishihara 2018-07-05 10:23 UTC
Try disabling the worker as a workaround:

[worker]
enabled = False
Comment 4 David Mastronarde 2018-07-05 10:39 UTC
Yes, this works too.

In terms of performance, am I better off leaving it like this or dropping back to 4.3.3, or does it matter?
Comment 5 Yuya Nishihara 2018-07-05 10:52 UTC
> In terms of performance, am I better off leaving it like this or dropping
> back to 4.3.3, or does it matter?

Windows worker was introduced in 4.5, so disabling it effectively means
backing to 4.3.3 regarding the performance of "hg update". If you don't
have any other problems, I suggest sticking to the latest version.
Comment 6 David Mastronarde 2018-07-05 11:26 UTC
Thanks for your  quick response and clear explanation of the situation
Comment 7 HG Bot 2018-07-19 13:55 UTC
Fixed by https://mercurial-scm.org/repo/hg/rev/be4984261611
Gregory Szorc <gregory.szorc@gmail.com>
merge: mark file gets as not thread safe (issue5933)

In default installs, this has the effect of disabling the thread-based
worker on Windows when manifesting files in the working directory. My
measurements have shown that with revlog-based repositories, Mercurial
spends a lot of CPU time in revlog code resolving file data. This ends
up incurring a lot of context switching across threads and slows down
`hg update` operations when going from an empty working directory to
the tip of the repo.

On mozilla-unified (246,351 files) on an i7-6700K (4+4 CPUs):

before: 487s wall
after:  360s wall (equivalent to worker.enabled=false)
cpus=2: 379s wall

Even with only 2 threads, the thread pool is still slower.

The introduction of the thread-based worker (02b36e860e0b) states that
it resulted in a "~50%" speedup for `hg sparse --enable-profile` and
`hg sparse --disable-profile`. This disagrees with my measurement
above. I theorize a few reasons for this:

1) Removal of files from the working directory is I/O - not CPU - bound
   and should benefit from a thread pool (unless I/O is insanely fast
   and the GIL release is near instantaneous). So tests like `hg sparse
   --enable-profile` may exercise deletion throughput and aren't good
   benchmarks for worker tasks that are CPU heavy.
2) The patch was authored by someone at Facebook. The results were
   likely measured against a repository using remotefilelog. And I
   believe that revision retrieval during working directory updates with
   remotefilelog will often use a remote store, thus being I/O and not
   CPU bound. This probably resulted in an overstated performance gain.

Since there appears to be a need to enable the thread-based worker with
some stores, I've made the flagging of file gets as thread safe
configurable. I've made it experimental because I don't want to formalize
a boolean flag for this option and because this attribute is best
captured against the store implementation. But we don't have a proper
store API for this yet. I'd rather cross this bridge later.

It is possible there are revlog-based repositories that do benefit from
a thread-based worker. I didn't do very comprehensive testing. If there
are, we may want to devise a more proper algorithm for whether to use
the thread-based worker, including possibly config options to limit the
number of threads to use. But until I see evidence that justifies
complexity, simplicity wins.

Differential Revision: https://phab.mercurial-scm.org/D3963

(please test the fix)
Comment 8 Bugzilla 2018-07-27 00:00 UTC
Bug was set to TESTING for 7 days, resolving