[Vm-dev] Intermittent VM hang [was: Which plugins for OSProcess?]

John M McIntosh johnmci at smalltalkconsulting.com
Sun Feb 11 06:21:42 UTC 2007


Mmm
startIndex=4294966784
arrayIndex = 0x1?

Would those be valid?

> #8  0x080a2499 in sqFileReadIntoAt (f=0x4019dfd8, count=1,  
> byteArrayIndex=0x1 <Address 0x1 out of bounds>,  
> startIndex=4294966784) at /data3/lewis/squeak/sq/Squeak64BitPort/ 
> images/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247

	arrayIndex = interpreterProxy->firstIndexableField(array);
	bytesRead = sqFileReadIntoAt(file, count * byteSize, arrayIndex,  
(startIndex - 1) * byteSize);


size_t sqFileReadIntoAt(SQFile *f, size_t count, char*  
byteArrayIndex, size_t startIndex) {
	/* Read count bytes from the given file into byteArray starting at
	   startIndex. byteArray is the address of the first byte of a
	   Squeak bytes object (e.g. String or ByteArray). startIndex
	   is a zero-based index; that is a startIndex of 0 starts writing
	   at the first byte of byteArray.
	*/

	char *dst;
	size_t bytesRead;
	FILE *file;

	if (!sqFileValid(f)) return interpreterProxy->success(false);
	file= getFile(f);
	if (f->writable && (f->lastOp == WRITE_OP)) fseek(file, 0,  
SEEK_CUR);  /* seek between writing and reading */
	dst = byteArrayIndex + startIndex;
	bytesRead = fread(dst, 1, count, file);


On Feb 10, 2007, at 4:23 PM, David T. Lewis wrote:

>
> I am moving this discussion to vm-dev, because it is unlikely to be  
> of interest
> on the general list.
>
> To add to my last message, I have now build a newer VM with an old  
> OSProcessPlugin:
>         Squeak3.8 of ''5 May 2005'' [latest update: #6665]
>         installed plugin UnixOSProcessPlugin 6 February 2007 (e)  
> version 3.3
>
> The problem still exists when running 300 iterations of:
>   OSProcess debugMessage: 'start 300 iterations'.
>   (1 to: 300) do: [:i |
>     OSProcess debugMessage: i asString.
>     CommandShell new pipeline: 'ls /no/such/file | stdout  
> nextPutAll: stdin upToEnd !']
>
> This give us:
>
> Older VM plus older OSPP ==> no failures
> Recent VM plus recent OSPP ==> intermittent failure after about 100  
> iterations
> Recent VM plus older OSPP ==> intermittent failure after about 100  
> iterations
>
> The problem is also unrelated to image version and unrelated to  
> OSProcess
> version.  That seems to narrow it down to something changing in the  
> VM (support
> code and/or VMMaker). That's purely correlational; it does not mean  
> there is
> something wrong in the VM, rather that something changed that  
> either provokes
> the bug in OSProcess or that OSProcess puts enough load on the VM  
> to provoke
> the bug. At this point, I have no clue as to which one it is.
>
> Bryce's original diagnosis was right on target:
>>> Here it looks like the primitive is blocking waiting for input that
>>> doesn't come even though it looks to me like it's blocking on a
>>> non-blocking file.
>>>
>>> Bryce
>
> After getting the VM to hang up like this, I can interrupt from gdb  
> and
> get a stack trace that makes absolutely no sense to me  
> (particularly since
> the test case does not do any file opens whatsoever). I'm no expert on
> gdb, so if anyone give me a pointer as to what this means, I'd be  
> grateful.
> Line 600 of FilePlugin.c is "interpreterProxy->popthenPush(3,  
> filePointer);"
> at the end of primitiveFileOpen(). It acts as if it's being called  
> recursively,
> which is impossible, and by the time we get to sqFileReadIntoAt(),  
> we are
> dealing with what looks like a bad case of scrambled brains. My  
> test is running
> on Linux, so there should be no pthreads involved. The stack trace  
> is attached
> below.
>
> Dave
>
> stack dump of VM interrupted from gdb:
>
> (gdb) run TestImageHang.1
> [New Thread 16384 (LWP 5753)]
> ^C
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 16384 (LWP 5753)]
> 0x4013d088 in read () from /lib/i686/libc.so.6
> (gdb) bt
> #0  0x4013d088 in read () from /lib/i686/libc.so.6
> #1  0x4019dfd8 in __DTOR_END__ () from /lib/i686/libc.so.6
> #2  0x400dc232 in _IO_file_read_internal () from /lib/i686/libc.so.6
> #3  0x400dafa1 in _IO_new_file_underflow () from /lib/i686/libc.so.6
> #4  0x400dcfc6 in __underflow () from /lib/i686/libc.so.6
> #5  0x400dc70a in _IO_file_xsgetn_internal () from /lib/i686/libc.so.6
> #6  0x400dd3e8 in _IO_sgetn_internal () from /lib/i686/libc.so.6
> #7  0x400d0942 in fread () from /lib/i686/libc.so.6
> #8  0x080a2499 in sqFileReadIntoAt (f=0x4019dfd8, count=1,  
> byteArrayIndex=0x1 <Address 0x1 out of bounds>,  
> startIndex=4294966784) at /data3/lewis/squeak/sq/Squeak64BitPort/ 
> images/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247
> #9  0x080a19ad in primitiveFileRead () at /data3/lewis/squeak/sq/ 
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:641
> #10 0x00000010 in ?? ()
> #11 0x41a6d770 in ?? ()
> #12 0x00000001 in ?? ()
> #13 0x40b1cc5c in ?? ()
> #14 0x080a1880 in primitiveFileOpen () at /data3/lewis/squeak/sq/ 
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:600
> #15 0x40b1cc5c in ?? ()
> #16 0x080a1880 in primitiveFileOpen () at /data3/lewis/squeak/sq/ 
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:600
> #17 0x41317be0 in ?? ()
> #18 0x000003f2 in ?? ()
> #19 0x080a1880 in primitiveFileOpen () at /data3/lewis/squeak/sq/ 
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:600
> #20 0x41a8c698 in ?? ()
> #21 0x40b1c61e in ?? ()
> #22 0x41a8c6c8 in ?? ()
> #23 0x41a8c698 in ?? ()
> #24 0x0805e4a7 in dispatchFunctionPointer  
> (aFunctionPointer=0x8067b50) at gnu-interp.c:4024
> #25 0x08062990 in interpret () at gnu-interp.c:7902
> #26 0x0805c04c in main (argc=-1073745031, argv=0x0,  
> envp=0xfffffe00) at /data3/lewis/squeak/sq/Squeak64BitPort/images/ 
> platforms/unix/vm/sqUnixMain.c:1388
> (gdb)
>
>
> On Tue, Feb 06, 2007 at 07:09:05AM -0500, David T. Lewis wrote:
>> Hi Bryce,
>>
>> I have been trying to isolate the problem (problems?) that you  
>> found here.
>> I have not identified it yet, but I do have some information to  
>> pass along.
>>
>> The problem is intermittent but reproducable. You reported that  
>> this fails:
>>
>> 	"(1 to: 300) do: [:i | Transcript show: i; cr.  
>> CommandShellTestCase run: #testPipeline73]"
>>
>> which I adapted as follows for testing:
>>
>> 	"OSProcess debugMessage: 'start 300 iterations'.
>> 	(1 to: 300) do: [:i |
>> 		OSProcess debugMessage: i asString.
>> 		CommandShell new pipeline: 'ls /no/such/file | stdout  
>> nextPutAll: stdin upToEnd !']"
>>
>> The failure mode is that the image hangs completely, and the VM  
>> appears
>> to be blocked on a low level file read. This is probably actually  
>> a file
>> descriptor for an OS pipe (pipe readers are set non-blocking by  
>> OSProcess,
>> but possibly there is a race condition that I have not identified).
>>
>> I tested image versions and OSProcess versions, and found:
>> - The problem is not related to image version 3.8 versus 3.9  
>> (fails on both)
>> - The problem is not related to OSProcess version 4.3 versus 4.0.1  
>> (fails on both)
>>
>> I tested VM and OSPP plugin versions and found:
>> - The combination of recent VM and OSPP fails:
>> 	Squeak3.8 of ''5 May 2005'' [latest update: #6665]
>> 	UnixOSProcessPlugin 4 July 2004 (e) version 3.3
>> - An older combination of VM and OSPP does not fail:
>> 	Squeak3.7beta of ''1 April 2004'' [latest update: #5923]
>> 	UnixOSProcessPlugin 4 July 2004 (e) version 3.3
>>
>> So far this suggests a problem associated with some change in the VM
>> and/or plugins. I have not yet successfully built a combination of  
>> newer
>> VM with the older OSPP (due to various annoyances in the build,  
>> and I'm
>> out of spare time for now). Hopefully if I can do this, the  
>> problem can
>> be narrowed down to either the VM or the plugins (but note that this
>> would not necessarily mean the that VM or plugin has a fault; it  
>> could
>> still be a race condition in the image that is just aggrivated by  
>> some
>> change in the VM).
>>
>> There was also some discussion in this thread of possible aio  
>> problems.
>> This particular issue does not seem to be related to aoi in the VM or
>> to the AIO plugin.
>>
>> Hopefully more to follow later,
>>
>> Dave
>>
>> On Mon, Jan 29, 2007 at 10:45:57PM +0000,  
>> bryce at kampjes.demon.co.uk wrote:
>>>
>>> I'm have some problems getting OSProcess 4.3 to run reliably, I'm
>>> using a VM with AioPlugin 2.0 and OSProcessPlugin 4.0.1. These  
>>> are the
>>> latest versions availible via SqueakMap.
>>>
>>> Most of my testing was done in my standard development image  
>>> built on
>>> the Squeak 3.9 developer image using an Exupery VM. Exupery was  
>>> loaded
>>> but not running. I'm trying to get Exupery's stress test to pass  
>>> which
>>> runs almost all the tests in the image 3 times. Up until now the
>>> OSProcess tests have been useful for flushing out context switching
>>> bugs.
>>>
>>> Any idea how to build a VM with OSProcess that will run 3.9 images
>>> reliably?
>>>
>>>
>>> With both AioPlugin and OSProcessPlugin installed running:
>>>
>>>      OSPipeTestCase buildSuite run
>>>
>>> May lock up the image consuming 100% CPU. This only happens when
>>> AioPlugin has been built and installed. It doesn't always happen.
>>>
>>> Cannot find new threads: generic error
>>> ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88):
>>>   /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/ 
>>> UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID
>>> ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88):
>>>   /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/ 
>>> UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag
>>> ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88):
>>>   /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/ 
>>> UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag
>>>
>>> Hitting Alt-. sometimes locks up the image after bringing up a
>>> notifier.
>>>
>>> When the image has locked up interrupting with gdb shows:
>>> (gdb) p printCallStack ()
>>> 1715565776 >idleProcess
>>> 1715534844 [] in >startUp
>>> 1715534936 [] in BlockContext>newProcess
>>> $5 = 10
>>> (gdb) where
>>> #0  0xffffe405 in __kernel_vsyscall ()
>>> #1  0xf7f0d08d in select () from /lib/libc.so.6
>>> #2  0x0806ab6c in aioPoll (microSeconds=96000)
>>>     at /home/bryce/squeak/exuperyNew/platforms/unix/vm/aio.c:226
>>> #3  0xf7e51743 in display_ioRelinquishProcessorForMicroseconds (
>>>     microSeconds=96000)
>>>     at /home/bryce/squeak/exuperyNew/platforms/unix/vm-display- 
>>> X11/sqUnixX11.c:2304
>>> #4  0x080523fc in ioRelinquishProcessorForMicroseconds (us=1000)
>>>     at /home/bryce/squeak/exuperyNew/platforms/unix/vm/ 
>>> sqUnixMain.c:477
>>> #5  0x0805be14 in primitiveRelinquishProcessor () at gnu-interp.c: 
>>> 19142
>>> #6  0x080535f7 in dispatchFunctionPointer  
>>> (aFunctionPointer=0x805bde0)
>>>     at gnu-interp.c:4093
>>> #7  0x08066c57 in interpret () at gnu-interp.c:9080
>>> #8  0x08052019 in main (argc=256, argv=0x0, envp=0x0)
>>>     at
>>> /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390
>>>
>>>
>>> Running:
>>>
>>> 	300 timesRepeat: [CommandShellTestCase run: #testPipeline73]
>>>
>>> Causes the image to lock up consuming 0% CPU. This happens even when
>>> AioPlugin isn't loaded and I've reproduced it in a clean 3.9 image
>>> with just OSProcess loaded running the latest Linux VM from
>>> squeak.org. My Squeak images normally idle consuming a few  
>>> percent CPU
>>> due to polling.
>>>
>>> Cannot find new threads: generic error
>>> ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88):
>>>   /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/ 
>>> UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID
>>>
>>> 0xffffe405 in __kernel_vsyscall ()
>>> (gdb) where
>>> #0  0xffffe405 in __kernel_vsyscall ()
>>> #1  0xf7e39143 in read () from /lib/libc.so.6
>>> #2  0xf7deb658 in _IO_file_read () from /lib/libc.so.6
>>> #3  0xf7dec83a in _IO_file_underflow () from /lib/libc.so.6
>>> #4  0xf7dee3cd in __underflow () from /lib/libc.so.6
>>> #5  0xf7deb15b in _IO_file_seek () from /lib/libc.so.6
>>> #6  0xf7decfa8 in _IO_sgetn () from /lib/libc.so.6
>>> #7  0xf7de19f0 in fread () from /lib/libc.so.6
>>> #8  0xf7bb2787 in sqFileReadIntoAt (f=0x66412590, count=1,
>>>     byteArrayIndex=0x66412814 "", startIndex=0)
>>>     at /home/bryce/squeak/exuperyNew/platforms/Cross/plugins/ 
>>> FilePlugin/sqFilePluginBasicPrims.c:247
>>> #9  0xf7bb12a4 in primitiveFileRead ()
>>>     at /home/bryce/squeak/exuperyNew/src/plugins/FilePlugin/ 
>>> FilePlugin.c:641
>>> #10 0x080535f7 in dispatchFunctionPointer  
>>> (aFunctionPointer=0xf7bb1160)
>>>     at gnu-interp.c:4093
>>> #11 0x0805c315 in primitiveExternalCall () at gnu-interp.c:15540
>>> #12 0x080535f7 in dispatchFunctionPointer  
>>> (aFunctionPointer=0x805c240)
>>>     at gnu-interp.c:4093
>>> #13 0x08066c57 in interpret () at gnu-interp.c:9080
>>> #14 0x08052019 in main (argc=0, argv=0x0, envp=0x0)
>>>     at /home/bryce/squeak/exuperyNew/platforms/unix/vm/ 
>>> sqUnixMain.c:1390
>>> (gdb) p printCallStack ()
>>> 1715621928 StandardFileStream>basicNext
>>> 1715621836 StandardFileStream>next
>>> 1715622132 [] in OSPipe>next
>>> 1715621628 BlockContext>on:do:
>>> 1715621536 OSPipe>next
>>> 1715622040 [] in OSPipe>next:
>>> 1715621444 Interval>do:
>>> 1715621352 OSPipe>next:
>>> 1715621720 [] in OSPipe>upToEnd
>>> 1715621236 BlockContext>repeat
>>> 1715621144 OSPipe>upToEnd
>>> 1715620776 [] in PipeableEvaluator>?
>>> 1715620636 PipeableEvaluator>blockValue
>>> 1715620868 [] in PipeableEvaluator>value
>>> 1715620960 [] in PipeableEvaluator>handleRuntimeErrorFor:
>>> 1715620524 BlockContext>on:do:
>>> 1715620432 PipeableEvaluator>handleRuntimeErrorFor:
>>> 1715565264 PipeableEvaluator>value
>>> 1715550668 CommandShell>pipeProxy:toCommandList:
>>> 1715423904 CommandShell>pipeline:
>>> 1715423788 CommandShellTestCase>testPipeline73
>>>
>>> Here it looks like the primitive is blocking waiting for input that
>>> doesn't come even though it looks to me like it's blocking on a
>>> non-blocking file.
>>>
>>> Bryce

--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===




More information about the Vm-dev mailing list