trying to make tracebacks reproducible

Matt Mackall mpm at selenic.com
Tue Jul 6 10:49:39 CDT 2010


On Tue, 2010-07-06 at 10:16 +0200, Christian Ebert wrote:
> * Matt Mackall on Monday, July 05, 2010 at 14:05:49 -0500
> > On Sun, 2010-07-04 at 01:42 +0200, Christian Ebert wrote:
> >> * Christian Ebert on Sunday, July 04, 2010 at 01:19:31 +0200
> >>> * Christian Ebert on Sunday, July 04, 2010 at 01:03:50 +0200
> >>>> * Christian Ebert on Friday, July 02, 2010 at 02:12:44 +0200
> >>>>> * Martin Geisler on Thursday, July 01, 2010 at 23:33:24 +0200
> >>>>>> Christian Ebert <blacktrash at gmx.net> writes:
> >>>>>>> Since a few days -- sorry for being vague, but this is actually
> >>>>>>> part of the problem, I _sometimes_ get tracebacks with
> >>>>>>> crew-stable (basically 1.6 I'd say). I cannot reproduce them
> >>>>>>> "reliably", i.e. in the following example, I reissued the command
> >>>>>>> and got the diff as expected. I can reduce the loaded extensions
> >>>>>>> etc. but I'd like to reproduce this reliably first. It seems to
> >>>>>>> happen at random - well, in the true sense of the word, if you
> >>>>>>> look at the final lines of the traceback.
> >>>>>> 
> >>>>>> Heh, nice one :-)
> >>>>>> 
> >>>>>>> I'd be grateful for any ideas.
> >>>>>> 
> >>>>>>> mod = _origimport(name, globals, locals)
> >>>>>>> File "/sw/lib/python2.6/random.py", line 59, in <module>
> >>>>>>> LOG4 = _log(4.0)
> >>>>>>> ValueError: math domain error
> >>>>>> 
> >>>>>> Can you make it fail if you do something like
> >>>>>> 
> >>>>>> while python -c 'import random; print random.LOG4'; do done
> >>>>> 
> >>>>> That's evil!
> >>>>> 
> >>>>> Yes, I can/could. After rebooting the problem has gone away.
> >>>>> Probably I overtortured my machine with multithreaded video
> >>>>> conversion. Well, let's hope this is not a sign of senility - of
> >>>>> the machine I mean.
> >>>> 
> >>>> And now for the strangest thing: it does NOT happen with
> >>>> Mercurial 1.5! But reliably with 1.6. Will bisect.
> >>> 
> >>> And the winner is:
> >>> 
> >>> changeset:   11182:3c368a1c962d
> >>> branch:      stable
> >>> parent:      11171:3b3261f6d9ba
> >>> user:        Brodie Rao <brodie at bitheap.org>
> >>> date:        Mon May 03 14:00:34 2010 -0500
> >>> summary:     pager: fork and exec pager as parent process
> >>> 
> >>> Conditions:
> >>> 
> >>> [extensions]
> >>> pager=
> >>> [pager]
> >>> less -FX
> >> 
> >> and (!)
> >> 
> >> [diff]
> >> git = True
> >> 
> >>> and running a cpu intensive video conversion. Then, with every
> >>> second or third call -- obviously one that involves the pager --
> >>> I get the traceback ...
> >> 
> >> So: "hg diff" under the above conditions breaks.
> >> 
> >> Very weird.
> > 
> > You've probably found a kernel bug. In particular, some failure to
> > save/restore floating point state correctly during task switch. Since
> > this is expensive, most operating systems try to avoid doing it if
> > floating point is not in use by a given task.
> 
> I see (sort of). Nothing one can do about it then?

Depends on what kernel you're running. If it's Linux, we can fix it
(though I'm pretty sure this entire class of bugs was fixed most of a
decade ago, which is why I already have a good idea what the problem
is).

-- 
Mathematics is the supreme nostalgia of our time.




More information about the Mercurial-devel mailing list