Transient Windows test failures
Matt Harbison
mharbison72 at gmail.com
Mon Jul 10 03:15:01 UTC 2017
On Mon, 19 Jun 2017 11:30:28 -0400, Yuya Nishihara <yuya at tcha.org> wrote:
> On Sun, 18 Jun 2017 22:19:29 -0400, Augie Fackler wrote:
>>
>> > On Jun 16, 2017, at 22:02, Matt Harbison <mharbison72 at gmail.com>
>> wrote:
>> >
>> > On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf at durin42.com>
>> wrote:
>> >
>> >> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
>> >>> So apparently, this is a symptom of not having %SystemRoot% in the
>> >>> environment when calling CreateProcess().
>> >>>
>> >>> https://bugs.python.org/issue13524
>> >>>
>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
>> >>>
>> >>> I see that setup.py special cases this variable. I did a search
>> for 'env
>> >>> =', and it looks like hooks and pager start with empty
>> environments, so they
>> >>> must not inherit this. IDR if any recent changes were made that
>> start with
>> >>> an empty environment.
>> >>>
>> >>> The thing I can't get my mind around is the hit and miss nature of
>> the
>> >>> error, if this is really the problem.
>> >>
>> >> It sounds like it should be harmless to just always forward
>> >> %SystemRoot% - should we just do that?
>> >
>> > Seems reasonable, but run-tests._getenv() already does an
>> os.environ.copy(), so it should be there?
>> >
>> > It does seem like a good idea to do it for hooks and other things
>> executed, where the environment is built from scratch. The question is
>> where? There's util.popen[2-4](), plus some direct calls to
>> subprocess.Popen(), and an os.system(). I considered
>> util.shellenviron(), but there are far fewer calls to this than places
>> where processes are spawned.
>
> (+CC foozy since he has Windows)
>
> Is the problem only seen in tests? I don't think environment variables
> are
> cleared in hg side.
I hit this problem again this weekend, after it was quiet for the past
couple of weeks. It looks like it might be an issue with nearly running
low on memory.
When it happened this time, Windows popped up a dialog box saying memory
was running low, offering to kill some programs. I had TaskManager open,
and saw the Performance > System > Commit (MB) line was running around
5400/6076. I closed thg, which exited thg.exe and hg.exe (listed at ~300
MB each in the process list), and the issue stopped. I was able to
recreate it again after a day of quiet by opening up a bunch of tabs in
FireFox, and pushing the memory usage around that threshold. I tend to
run the tests with -j9, and I've seen the first number bounce around
between 4900 and 5700+ during these failures. So I'm not sure what the
exact problem threshold is, as tests start and exit. Interestingly, the
free memory number in Resource Monitor at the same time indicates only
150MB-20MB free.
One of the "optimizations" of the SSD install software was to cap the page
file, which is probably why I hadn't seen this until recently. Kostia had
mentioned to me that he was seeing errors saying âapplication failed to
start 0xc0000142â, which I also saw (along with dialog box failures of
various msys executables, like env.exe and grep.exe). So maybe this is
useful to others wanting to run Windows tests. It seems unlikely that it
would be seen in the wild (the page file usually isn't capped), and I
doubt there's anything we can do about it anyway.
More information about the Mercurial-devel
mailing list