[BUG] [CRASH] More vm crash bugs

Hans-Martin Mosner hm.mosner at cityweb.de
Tue Aug 14 19:01:29 UTC 2001


Hi Scott,
thanks for providing actual recipes for reproducing bugs!

Scott A Crosby wrote:
> 
> On Tue, 14 Aug 2001, robin wrote:
> 
> >
> > I'm also aware that without enough information to reproduce a bug, in 99% of
> > cases there's little that can be done - from a scientific perspective it
> > might even be said not to exist.  In future, I'll make sure to include it.
> >
> 
> Select from the browser, PseudoContext, then go through the menu for
> 'sample instance' CRASH.
> 
> Recursive not understood error encountered
> 
> 1090900340 CodeHolder>makeSampleInstance
> 1090900248 [] in MenuItemMorph>invokeWithEvent:
> 1090899972 BlockContext>ensure:
> 1090899880 Cursor>showWhile:
> 1090877840 MenuItemMorph>invokeWithEvent:
> 1090877732 MenuItemMorph>mouseUp:
> 1090877640 MenuItemMorph>handleMouseUp:
> 1090877456 MouseButtonEvent>sentTo:
> 1090877364 Morph>handleEvent:
> ...
> 
> This bug also applies to attempting to make sample instances with other
> classes. You can run an automated test for this and see where else it
> breaks.

The bug here is caused by PseudoContext not implementing the
doesNotUnderstand: message. The interpreter tries to send this to an
object when a message is not implemented in that object's class or
superclasses. PseudoContext has pretty little behavior, and so this
breaks. In practice, it doesn't hurt much since PseudoContexts are
supposed to be created only by the Jitter, and to hide under the nearest
rock when you look at them. But you're right, they should implement
doesNotUnderstand: to catch this specific problem.

But I'm at a loss regarding the other classes for which this fails. The
only other ones having a nil superclass are ProtoObject and
ObjectTracer, and both implement doesNotUnderstand: and therefore fail softly.
Others will also fail softly or do "interesting" things because not
every class in the system is set upt to produce instances correctly
right on the spot... But none should get into this particular recursive
doesNotUnderstand: error.

> 
> --
> 
> Open up a bunch of sockets, stay connected to them, then save your image,
> then try to restore from the saved image. You'll almost alwyas get *this*
> traceback; this is the second time out of about 4 attempts that I've hit
> this, and it makes my image unusable. My implementation is to construct a
> seperate process to read&write from the socket. The image crashes after a
> second when this process runs and causes the primitive to segfault:
> 
> crosby at dragonlight:~/squeak/squeak.mud-3.1$ /usr/local/packages/squeak-3.1/bin/squeak
> 
> Segmentation fault
> 
> 1090757868 Socket>destroy
> 1088785580 MuckSocket>waitForData
> 1088785460 MuckSocket>networkGetSingleLine
> 1088674572 [] in MuckPlayerConnection>networkReaderLoop
> 1088674848 BlockContext>on:do:
> 1088674480 BlockContext>ifError:
> 1088674388 MuckPlayerConnection>networkReaderLoop
> 1088673964 [] in MuckPlayerConnection>initialize:
> 1088674056 [] in BlockContext>newProcess
> 
> --
> 
> My workaround is to replace *THIS* function so that it always fails.
> 
> static int socketValid(SocketPtr s) {
>   interpreterProxy->success(false);
>   return false;
> 
>         /* ORIGIONAL BODY */
>   if ((s != 0) && (s->privateSocketPtr != 0) && (s->sessionID ==  thisNetSession))
>     return true;
>   interpreterProxy->success(false);
>   return false;
> }
> 
> I then run under this version of squeak to recover my image and code; then
> back under my normal squeak binary to continue testing.
> 
> My suspician was that 's' isn't being cleaned up upon image reload, and
> dereferencing it here was the cause of the segfault.
> 
Yes, there is a problem with stale pointers being held in the image for
external resources such as sockets and files. These require careful
cleanup upon image startup, and at least in the case of files the
VM-level code does a pretty good job of ensuring that you only work with
valid file ids. Why this breaks in the socket case I don't know, but it
might be a platform-dependen socket implementation that is less robust
than desirable.
> --
> 
> Then, there are cases where you overload it with packets, and process
> switches; I've crashed it 3/4 times, and each in a different way.

Probably you should blame this either on the platform or on
platform-specific VM code.
But in any case, the whole networking issue needs to be sorted out some
day. Craig Latta has done some pretty nice things with his "flow"
framework, but AFAIK this needs different implementations of the socket
primitives, which prevented me from trying out his stuff...

Cheers,
Hans-Martin






More information about the Squeak-dev mailing list