[Vm-dev] Intermittent VM hang [was: Which plugins for OSProcess?]
John M McIntosh
johnmci at smalltalkconsulting.com
Sun Feb 11 06:21:42 UTC 2007
Mmm
startIndex=4294966784
arrayIndex = 0x1?
Would those be valid?
> #8 0x080a2499 in sqFileReadIntoAt (f=0x4019dfd8, count=1,
> byteArrayIndex=0x1 <Address 0x1 out of bounds>,
> startIndex=4294966784) at /data3/lewis/squeak/sq/Squeak64BitPort/
> images/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247
arrayIndex = interpreterProxy->firstIndexableField(array);
bytesRead = sqFileReadIntoAt(file, count * byteSize, arrayIndex,
(startIndex - 1) * byteSize);
size_t sqFileReadIntoAt(SQFile *f, size_t count, char*
byteArrayIndex, size_t startIndex) {
/* Read count bytes from the given file into byteArray starting at
startIndex. byteArray is the address of the first byte of a
Squeak bytes object (e.g. String or ByteArray). startIndex
is a zero-based index; that is a startIndex of 0 starts writing
at the first byte of byteArray.
*/
char *dst;
size_t bytesRead;
FILE *file;
if (!sqFileValid(f)) return interpreterProxy->success(false);
file= getFile(f);
if (f->writable && (f->lastOp == WRITE_OP)) fseek(file, 0,
SEEK_CUR); /* seek between writing and reading */
dst = byteArrayIndex + startIndex;
bytesRead = fread(dst, 1, count, file);
On Feb 10, 2007, at 4:23 PM, David T. Lewis wrote:
>
> I am moving this discussion to vm-dev, because it is unlikely to be
> of interest
> on the general list.
>
> To add to my last message, I have now build a newer VM with an old
> OSProcessPlugin:
> Squeak3.8 of ''5 May 2005'' [latest update: #6665]
> installed plugin UnixOSProcessPlugin 6 February 2007 (e)
> version 3.3
>
> The problem still exists when running 300 iterations of:
> OSProcess debugMessage: 'start 300 iterations'.
> (1 to: 300) do: [:i |
> OSProcess debugMessage: i asString.
> CommandShell new pipeline: 'ls /no/such/file | stdout
> nextPutAll: stdin upToEnd !']
>
> This give us:
>
> Older VM plus older OSPP ==> no failures
> Recent VM plus recent OSPP ==> intermittent failure after about 100
> iterations
> Recent VM plus older OSPP ==> intermittent failure after about 100
> iterations
>
> The problem is also unrelated to image version and unrelated to
> OSProcess
> version. That seems to narrow it down to something changing in the
> VM (support
> code and/or VMMaker). That's purely correlational; it does not mean
> there is
> something wrong in the VM, rather that something changed that
> either provokes
> the bug in OSProcess or that OSProcess puts enough load on the VM
> to provoke
> the bug. At this point, I have no clue as to which one it is.
>
> Bryce's original diagnosis was right on target:
>>> Here it looks like the primitive is blocking waiting for input that
>>> doesn't come even though it looks to me like it's blocking on a
>>> non-blocking file.
>>>
>>> Bryce
>
> After getting the VM to hang up like this, I can interrupt from gdb
> and
> get a stack trace that makes absolutely no sense to me
> (particularly since
> the test case does not do any file opens whatsoever). I'm no expert on
> gdb, so if anyone give me a pointer as to what this means, I'd be
> grateful.
> Line 600 of FilePlugin.c is "interpreterProxy->popthenPush(3,
> filePointer);"
> at the end of primitiveFileOpen(). It acts as if it's being called
> recursively,
> which is impossible, and by the time we get to sqFileReadIntoAt(),
> we are
> dealing with what looks like a bad case of scrambled brains. My
> test is running
> on Linux, so there should be no pthreads involved. The stack trace
> is attached
> below.
>
> Dave
>
> stack dump of VM interrupted from gdb:
>
> (gdb) run TestImageHang.1
> [New Thread 16384 (LWP 5753)]
> ^C
> Program received signal SIGINT, Interrupt.
> [Switching to Thread 16384 (LWP 5753)]
> 0x4013d088 in read () from /lib/i686/libc.so.6
> (gdb) bt
> #0 0x4013d088 in read () from /lib/i686/libc.so.6
> #1 0x4019dfd8 in __DTOR_END__ () from /lib/i686/libc.so.6
> #2 0x400dc232 in _IO_file_read_internal () from /lib/i686/libc.so.6
> #3 0x400dafa1 in _IO_new_file_underflow () from /lib/i686/libc.so.6
> #4 0x400dcfc6 in __underflow () from /lib/i686/libc.so.6
> #5 0x400dc70a in _IO_file_xsgetn_internal () from /lib/i686/libc.so.6
> #6 0x400dd3e8 in _IO_sgetn_internal () from /lib/i686/libc.so.6
> #7 0x400d0942 in fread () from /lib/i686/libc.so.6
> #8 0x080a2499 in sqFileReadIntoAt (f=0x4019dfd8, count=1,
> byteArrayIndex=0x1 <Address 0x1 out of bounds>,
> startIndex=4294966784) at /data3/lewis/squeak/sq/Squeak64BitPort/
> images/platforms/Cross/plugins/FilePlugin/sqFilePluginBasicPrims.c:247
> #9 0x080a19ad in primitiveFileRead () at /data3/lewis/squeak/sq/
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:641
> #10 0x00000010 in ?? ()
> #11 0x41a6d770 in ?? ()
> #12 0x00000001 in ?? ()
> #13 0x40b1cc5c in ?? ()
> #14 0x080a1880 in primitiveFileOpen () at /data3/lewis/squeak/sq/
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:600
> #15 0x40b1cc5c in ?? ()
> #16 0x080a1880 in primitiveFileOpen () at /data3/lewis/squeak/sq/
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:600
> #17 0x41317be0 in ?? ()
> #18 0x000003f2 in ?? ()
> #19 0x080a1880 in primitiveFileOpen () at /data3/lewis/squeak/sq/
> Squeak64BitPort/images/src/vm/intplugins/FilePlugin/FilePlugin.c:600
> #20 0x41a8c698 in ?? ()
> #21 0x40b1c61e in ?? ()
> #22 0x41a8c6c8 in ?? ()
> #23 0x41a8c698 in ?? ()
> #24 0x0805e4a7 in dispatchFunctionPointer
> (aFunctionPointer=0x8067b50) at gnu-interp.c:4024
> #25 0x08062990 in interpret () at gnu-interp.c:7902
> #26 0x0805c04c in main (argc=-1073745031, argv=0x0,
> envp=0xfffffe00) at /data3/lewis/squeak/sq/Squeak64BitPort/images/
> platforms/unix/vm/sqUnixMain.c:1388
> (gdb)
>
>
> On Tue, Feb 06, 2007 at 07:09:05AM -0500, David T. Lewis wrote:
>> Hi Bryce,
>>
>> I have been trying to isolate the problem (problems?) that you
>> found here.
>> I have not identified it yet, but I do have some information to
>> pass along.
>>
>> The problem is intermittent but reproducable. You reported that
>> this fails:
>>
>> "(1 to: 300) do: [:i | Transcript show: i; cr.
>> CommandShellTestCase run: #testPipeline73]"
>>
>> which I adapted as follows for testing:
>>
>> "OSProcess debugMessage: 'start 300 iterations'.
>> (1 to: 300) do: [:i |
>> OSProcess debugMessage: i asString.
>> CommandShell new pipeline: 'ls /no/such/file | stdout
>> nextPutAll: stdin upToEnd !']"
>>
>> The failure mode is that the image hangs completely, and the VM
>> appears
>> to be blocked on a low level file read. This is probably actually
>> a file
>> descriptor for an OS pipe (pipe readers are set non-blocking by
>> OSProcess,
>> but possibly there is a race condition that I have not identified).
>>
>> I tested image versions and OSProcess versions, and found:
>> - The problem is not related to image version 3.8 versus 3.9
>> (fails on both)
>> - The problem is not related to OSProcess version 4.3 versus 4.0.1
>> (fails on both)
>>
>> I tested VM and OSPP plugin versions and found:
>> - The combination of recent VM and OSPP fails:
>> Squeak3.8 of ''5 May 2005'' [latest update: #6665]
>> UnixOSProcessPlugin 4 July 2004 (e) version 3.3
>> - An older combination of VM and OSPP does not fail:
>> Squeak3.7beta of ''1 April 2004'' [latest update: #5923]
>> UnixOSProcessPlugin 4 July 2004 (e) version 3.3
>>
>> So far this suggests a problem associated with some change in the VM
>> and/or plugins. I have not yet successfully built a combination of
>> newer
>> VM with the older OSPP (due to various annoyances in the build,
>> and I'm
>> out of spare time for now). Hopefully if I can do this, the
>> problem can
>> be narrowed down to either the VM or the plugins (but note that this
>> would not necessarily mean the that VM or plugin has a fault; it
>> could
>> still be a race condition in the image that is just aggrivated by
>> some
>> change in the VM).
>>
>> There was also some discussion in this thread of possible aio
>> problems.
>> This particular issue does not seem to be related to aoi in the VM or
>> to the AIO plugin.
>>
>> Hopefully more to follow later,
>>
>> Dave
>>
>> On Mon, Jan 29, 2007 at 10:45:57PM +0000,
>> bryce at kampjes.demon.co.uk wrote:
>>>
>>> I'm have some problems getting OSProcess 4.3 to run reliably, I'm
>>> using a VM with AioPlugin 2.0 and OSProcessPlugin 4.0.1. These
>>> are the
>>> latest versions availible via SqueakMap.
>>>
>>> Most of my testing was done in my standard development image
>>> built on
>>> the Squeak 3.9 developer image using an Exupery VM. Exupery was
>>> loaded
>>> but not running. I'm trying to get Exupery's stress test to pass
>>> which
>>> runs almost all the tests in the image 3 times. Up until now the
>>> OSProcess tests have been useful for flushing out context switching
>>> bugs.
>>>
>>> Any idea how to build a VM with OSProcess that will run 3.9 images
>>> reliably?
>>>
>>>
>>> With both AioPlugin and OSProcessPlugin installed running:
>>>
>>> OSPipeTestCase buildSuite run
>>>
>>> May lock up the image consuming 100% CPU. This only happens when
>>> AioPlugin has been built and installed. It doesn't always happen.
>>>
>>> Cannot find new threads: generic error
>>> ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88):
>>> /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/
>>> UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID
>>> ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88):
>>> /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/
>>> UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag
>>> ioFindExternalFunctionIn(primitiveTestEndOfFileFlag, 0x80a4e88):
>>> /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/
>>> UnixOSProcessPlugin: undefined symbol: primitiveTestEndOfFileFlag
>>>
>>> Hitting Alt-. sometimes locks up the image after bringing up a
>>> notifier.
>>>
>>> When the image has locked up interrupting with gdb shows:
>>> (gdb) p printCallStack ()
>>> 1715565776 >idleProcess
>>> 1715534844 [] in >startUp
>>> 1715534936 [] in BlockContext>newProcess
>>> $5 = 10
>>> (gdb) where
>>> #0 0xffffe405 in __kernel_vsyscall ()
>>> #1 0xf7f0d08d in select () from /lib/libc.so.6
>>> #2 0x0806ab6c in aioPoll (microSeconds=96000)
>>> at /home/bryce/squeak/exuperyNew/platforms/unix/vm/aio.c:226
>>> #3 0xf7e51743 in display_ioRelinquishProcessorForMicroseconds (
>>> microSeconds=96000)
>>> at /home/bryce/squeak/exuperyNew/platforms/unix/vm-display-
>>> X11/sqUnixX11.c:2304
>>> #4 0x080523fc in ioRelinquishProcessorForMicroseconds (us=1000)
>>> at /home/bryce/squeak/exuperyNew/platforms/unix/vm/
>>> sqUnixMain.c:477
>>> #5 0x0805be14 in primitiveRelinquishProcessor () at gnu-interp.c:
>>> 19142
>>> #6 0x080535f7 in dispatchFunctionPointer
>>> (aFunctionPointer=0x805bde0)
>>> at gnu-interp.c:4093
>>> #7 0x08066c57 in interpret () at gnu-interp.c:9080
>>> #8 0x08052019 in main (argc=256, argv=0x0, envp=0x0)
>>> at
>>> /home/bryce/squeak/exuperyNew/platforms/unix/vm/sqUnixMain.c:1390
>>>
>>>
>>> Running:
>>>
>>> 300 timesRepeat: [CommandShellTestCase run: #testPipeline73]
>>>
>>> Causes the image to lock up consuming 0% CPU. This happens even when
>>> AioPlugin isn't loaded and I've reproduced it in a clean 3.9 image
>>> with just OSProcess loaded running the latest Linux VM from
>>> squeak.org. My Squeak images normally idle consuming a few
>>> percent CPU
>>> due to polling.
>>>
>>> Cannot find new threads: generic error
>>> ioFindExternalFunctionIn(primitiveGetThreadID, 0x80a4e88):
>>> /home/bryce/squeak/exuperyNew/build/UnixOSProcessPlugin/.libs/
>>> UnixOSProcessPlugin: undefined symbol: primitiveGetThreadID
>>>
>>> 0xffffe405 in __kernel_vsyscall ()
>>> (gdb) where
>>> #0 0xffffe405 in __kernel_vsyscall ()
>>> #1 0xf7e39143 in read () from /lib/libc.so.6
>>> #2 0xf7deb658 in _IO_file_read () from /lib/libc.so.6
>>> #3 0xf7dec83a in _IO_file_underflow () from /lib/libc.so.6
>>> #4 0xf7dee3cd in __underflow () from /lib/libc.so.6
>>> #5 0xf7deb15b in _IO_file_seek () from /lib/libc.so.6
>>> #6 0xf7decfa8 in _IO_sgetn () from /lib/libc.so.6
>>> #7 0xf7de19f0 in fread () from /lib/libc.so.6
>>> #8 0xf7bb2787 in sqFileReadIntoAt (f=0x66412590, count=1,
>>> byteArrayIndex=0x66412814 "", startIndex=0)
>>> at /home/bryce/squeak/exuperyNew/platforms/Cross/plugins/
>>> FilePlugin/sqFilePluginBasicPrims.c:247
>>> #9 0xf7bb12a4 in primitiveFileRead ()
>>> at /home/bryce/squeak/exuperyNew/src/plugins/FilePlugin/
>>> FilePlugin.c:641
>>> #10 0x080535f7 in dispatchFunctionPointer
>>> (aFunctionPointer=0xf7bb1160)
>>> at gnu-interp.c:4093
>>> #11 0x0805c315 in primitiveExternalCall () at gnu-interp.c:15540
>>> #12 0x080535f7 in dispatchFunctionPointer
>>> (aFunctionPointer=0x805c240)
>>> at gnu-interp.c:4093
>>> #13 0x08066c57 in interpret () at gnu-interp.c:9080
>>> #14 0x08052019 in main (argc=0, argv=0x0, envp=0x0)
>>> at /home/bryce/squeak/exuperyNew/platforms/unix/vm/
>>> sqUnixMain.c:1390
>>> (gdb) p printCallStack ()
>>> 1715621928 StandardFileStream>basicNext
>>> 1715621836 StandardFileStream>next
>>> 1715622132 [] in OSPipe>next
>>> 1715621628 BlockContext>on:do:
>>> 1715621536 OSPipe>next
>>> 1715622040 [] in OSPipe>next:
>>> 1715621444 Interval>do:
>>> 1715621352 OSPipe>next:
>>> 1715621720 [] in OSPipe>upToEnd
>>> 1715621236 BlockContext>repeat
>>> 1715621144 OSPipe>upToEnd
>>> 1715620776 [] in PipeableEvaluator>?
>>> 1715620636 PipeableEvaluator>blockValue
>>> 1715620868 [] in PipeableEvaluator>value
>>> 1715620960 [] in PipeableEvaluator>handleRuntimeErrorFor:
>>> 1715620524 BlockContext>on:do:
>>> 1715620432 PipeableEvaluator>handleRuntimeErrorFor:
>>> 1715565264 PipeableEvaluator>value
>>> 1715550668 CommandShell>pipeProxy:toCommandList:
>>> 1715423904 CommandShell>pipeline:
>>> 1715423788 CommandShellTestCase>testPipeline73
>>>
>>> Here it looks like the primitive is blocking waiting for input that
>>> doesn't come even though it looks to me like it's blocking on a
>>> non-blocking file.
>>>
>>> Bryce
--
========================================================================
===
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===
More information about the Vm-dev
mailing list