Help! Squeak crashes during updates

Jan Bottorff janb at pmatrix.com
Mon Dec 20 19:24:29 UTC 1999


At 09:06 AM 12/20/99 -0500, Danny Sharpe wrote:
>Okay, I ran it again under MS Visual Studio debugger, tried to enter Play
>With Me 7, and as usual it died on the instruction indicated below:
>
>    01508980   push        ebp
>    01508981   mov         ebp,esp
>    01508983   and         esp,0F8h
>    01508986   sub         esp,20h
>    01508989   mov         ecx,dword ptr [ebp+8]
>    0150898C   mov         eax,dword ptr [ebp+0Ch]
>    0150898F   fld         dword ptr [ecx+0Ch]
>    01508992   fstp        qword ptr [esp+10h]
>    01508996   fld         dword ptr [ecx+10h]
>    01508999   fstp        qword ptr [esp+8]
>    0150899D   fld         dword ptr [ecx+14h]
>--> 015089A0   fstp        qword ptr [esp+18h]
>    015089A4   fld         dword ptr [eax+4]
>    015089A7   fmul        qword ptr [esp+8]
>    015089AB   fld         dword ptr [eax]
>    015089AD   fmul        qword ptr [esp+10h]
>    015089B1   faddp       st(1),st
>
>Here are the register contents:
>
>    EAX = 02144794 EBX = 0000002A ECX = 021901C4 EDX = 00000010
>    ESI = 021901C4 EDI = 02144794 EIP = 015089A0 ESP = 0068FB00
>    EBP = 0068FB20 EFL = 00010206 CS = 0137 DS = 013F ES = 013F SS = 013F
>    FS = 47AF GS = 0000 OV=0 UP=0 EI=1 PL=0 ZR=0 AC=0 PE=1 CY=0
>
>    0068FB18 = BFD6E05BE0000000
>
>    ST0 = +0.00000000000000000e+0000 ST1 = +0.00000000000000000e+0000
>    ST2 = +0.00000000000000000e+0000 ST3 = +0.00000000000000000e+0000
>    ST4 = +0.00000000000000000e+0000 ST5 = +6.30592279601072110e-0002
>    ST6 = -3.57443779706954956e-0001 ST7 = +3.23359996080398560e-0001
>    CTRL = 1270 STAT = B9A2 TAGS = 3FFF EIP = 0150899D CS = 0137 DS = 013F
>    EDO = 021901D8

This tells us a lot. First, it's NOT a missing EMMS instruction problem, as
the TAGS register is not 0000. If you look at the CTRL register, you find
lots of the FP exceptions are enabled. Bringing up a very simple program
under MSVC 6, it says the default value for the FP control register is
0x027F which means, mask all exceptions, and use 53-bit precision. The 1270
value in the dump means, enable exceptions for Invalid operation, denormal
operation, zero divide, and overflow, 53-bit precision, and the infinity
control flag is set (which my books say is provided for compatability with
the 287 math coprocessor, and is not meaningful for a Pentium). The status
register says it's not a stack fault, and the FPU has seen a denormal and
loss of precision since the status was last cleared.

So it looks like the problem is SOMEBODY is enabling FP exceptions and not
turning them back off. My first suggestion would be to put a little chunk
of code in the return from a primitive handler, that breakboints when the
FP control register is no longer 0x27F. It might also be possible to put a
breakpoint on the C run-time function that set's the FP control register,
although there is no guarantee it get's set by calling this function. It
might be, and if so, lots of useful debugging info would be collected (like
a stack trace of the offending code).

I'd also suggest taking a stripped OS and see if the problem exists. Could
be some user mode driver or other system add on is the guilty party.
Installing clean OS's just for testing is probably not most peoples idea of
a fun day though. I can't offhand think of a fast way to breakpoint on a
specific instruction. The real question is what piece of code loads 0x1270
into the FPU control register. Too bad we don't have OS source, it would be
really easy to turn off the FPU and modify the FPU emulator to trap when
0x1270 is written to the emulated FP control register. You could also take
some debugger that can step and output the instruction execution trace to
log file, and let it run a long time (days). A text search of the log may
then find the offending code. A binary search for the offending code by a
programmer might also do the trick, and be a LOT faster than emulating a
billion instructions, although it's a lot of work. We know that at program
startup, the FP control register is set correct, and we know at this
failure point it's wrong, so it's a simple matter of moving the two
boundries until they meet.

- Jan





More information about the Squeak-dev mailing list