[Vm-dev] Re: Cog segmentation fault on Linux
Rob Withers
reefedjib at yahoo.com
Wed Jul 21 10:58:07 UTC 2010
Eliot,
I am trying to narrow down what may be causing this. I took my looping GC image and shut down more processes where I could, including the eventTickler. My script is:
Sensor shutDown.
VatTPManager stop.
[[Smalltalk garbageCollect] repeat] fork.
My Processes are:
"timerEventLoop Process - priority 80"
"lowSpaceWatcher Process -priority 60"
"finalization Process - priority 50"
"UIProcess - priority 40"
"user Process - priority 40 - [[Smalltalk garbageCollect] repeat] fork."
"idle Process - priority 10"
as always, the active stack on SegFault is the user Process doing a Smalltalk garbageCollect.
At this point I deciding to try and see what gdb would tell me. I ran the following command:
gdb --args lib/squeak/3.9-7/squeak -leakcheck 7 -vm-display-null -vm-sound-null echat-server-off.image garbageCollect.sq
It loaded symbols. I then issued the 'run' command. It runs squeak, but the resulting process doesn't accumulate cputime, like it isn't really running. I tried sending a USR1 to that pid, but it doesn't output anything. I issued the run command again but gdb seems to think it is running. ps aux also thinks this as there is an entry for squeak. It just isn't doing anything. Am I using gdb wrong? Is the stack paused? Here is the results of 'bt':
(gdb) bt
#0 0xf7ffd430 in __kernel_vsyscall ()
#1 0x4b3c22f6 in nanosleep () from /lib/libpthread.so.0
#2 0x0805c958 in tickerSleepCycle (ignored=0x0)
at /home1/vawhigso/public_html/squeakelib/Cog/platforms/unix/vm/sqUnixHeartbeat.c:375
#3 0x4b3ba832 in start_thread () from /lib/libpthread.so.0
#4 0x4b315e0e in clone () from /lib/libc.so.6
Thanks for any help,
Rob
From: Rob Withers
Sent: Tuesday, July 20, 2010 6:14 AM
To: Eliot Miranda
Cc: Squeak VM Dev
Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux
--------------------------------------------------------------------------------
Hey Eliot,
Here is what I have found. I never saw any output from the leak checker. I was able to generate seg faults in the original echat-server.image, which is doing socket stuff, AND I was able to generate it in a looping GC image. In the original echat-server image, I have a listening socket and I have a "Vat" which has installed a subclass of Process and is looping and I am running the RFB server. In the looping GC image, I turned off my listening socket, the Vat is not running and I stopped the RFB server. I run headless and I supply a script to run. I supplied the following script:
[Smalltalk garbageCollect] repeat.
It took a few attempts (8 attempts) but I eventually seg faulted.
I have attached the logfiles for both the echat-server scenario and the looping GC scenario. Search for #SIGUSR1 for each process dump section. Search for #SEGFAULT to find the section at the bottom that seg faulted. Search for #PREVSTACK to find Processes in the SEGFAULT sections that have garbage in them and what the corresponding stack in a previous healthy section was doing.
Note that of these corrupted Processes, #PREVSTACK (Delay class>handleTimerEvent) and #PREVSTACK (EventSensor>eventTickler) are bad in both scenarios.
HTH,
Rob
From: Rob Withers
Sent: Tuesday, July 20, 2010 4:51 AM
To: Eliot Miranda
Cc: Squeak VM Dev
Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux
--------------------------------------------------------------------------------
Hi Eliot,
(I forgot to CC the mailing list - added)
I made a few things happen.
First, I found that the argument to -leakcheck is an integer that gets masked to determine whether to leak check an incremental or full GC. I made the call with '-leakcheck 7'.
Second, I added a -leakcheck section to the COGVM section:
#if COGVM
else if (!strcmp(argv[0], "-leakcheck")) {
extern sqInt checkForLeaks;
checkForLeaks = atoi(argv[1]);
return 2; }
I compiled and ran it. I am unsure where any output from the leak checker goes. If it is to stdout or stderr, I forget the magic incantation to redirect these to files. I think it is '2> stderr.txt 1> stdout.txt' for /bin/sh. Is that right?
So when I ran it, it runs (the new image with stepping button bar - takes 30% cpu). When I send 'kill -USR1 <pid>' it seg faults guaranteed. This may or may not be the original seg fault - it may be the leakchecker?
The only stuff I am doing that calls out of the image is socket stuff. This may or may not be in the middle of a call when it seg faults. I will work to turn off all the socket activity and see if it still seg faults.
Am I activating the leakchecker ok?
Regards,
Rob
From: Rob Withers
Sent: Monday, July 19, 2010 9:22 PM
To: Eliot Miranda
Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux
hey Eliot,
It looks like this command line argument, -leakcheck, is for the STACKVM, not the COGVM. Is this an issue?
Thanks,
Rob
#if STACKVM
else if (!strcmp(argv[0], "-eden")) {
extern sqInt desiredEdenBytes;
desiredEdenBytes = strtobkm(argv[1]);
return 2; }
else if (!strcmp(argv[0], "-leakcheck")) {
extern sqInt checkForLeaks;
checkForLeaks = atoi(argv[1]);
return 2; }
else if (!strcmp(argv[0], "-stackpages")) {
extern sqInt desiredNumStackPages;
desiredNumStackPages = atoi(argv[1]);
return 2; }
else if (!strcmp(argv[0], "-breaksel")) {
extern void setBreakSelector(char *);
setBreakSelector(argv[1]);
return 2; }
else if (!strcmp(argv[0], "-noheartbeat")) {
extern sqInt suppressHeartbeatFlag;
suppressHeartbeatFlag = 1;
return 1; }
#endif /* STACKVM */
From: Rob Withers
Sent: Monday, July 19, 2010 9:19 PM
To: Eliot Miranda
Cc: Squeak VM Dev
Subject: [Vm-dev] Re: Cog segmentation fault on Linux
--------------------------------------------------------------------------------
Hi Eliot,
Got home from my new job and started looking into this. It turns out that this morning I found that I had a button bar that was stepping and part of the step was a Smalltalk garbageCollect to force collection before checking for instances. It may be something I don't need to do anymore, however it helps expose this seg fault. Both stack dumps were in the garbageCollect. I removed the button bar, uploaded the image, and ran it. CPU% dropped from 33% to 2%. I let it run all day. At some point it exited, for an unknown reason, as it was gone when I returned tonight.
I have reinstated the button bar, to help this bug occur, and uploaded it to the server.
Now I just need to enable -leakcheck. From sqUnixMain.c it looks like it takes an argument. What is that argument?
Thanks,
Rob
From: Eliot Miranda
Sent: Monday, July 19, 2010 1:47 PM
To: Rob Withers
Cc: Squeak Virtual Machine Development Discussion
Subject: Re: Cog segmentation fault on Linux
Hi Rob,
On Mon, Jul 19, 2010 at 3:10 AM, Rob Withers <reefedjib at yahoo.com> wrote:
Eliot,
I am getting a segmentation fault running Cog headless on linux. Here is the stack dump. Below is a second stack dump that looks different.
While the heap corruption might be a bug in Cog it might also be heap corruption from external code (e.g. objects passed through FFI calls to external code that overwrites those objects' bounds).
There's a leak checker in Cog (see the -leakcheck argument in platforms/unix/vm/sqUnixMain.c) that can help you localise this. Its best to distrust your code before you distrust the VM, simply because thinking it's the VM can blind-side you to potential bugs in your own code or other parts of the system. The goal here is a reproducible case. If you get a reproducible case that doesn't use any external code then the bug is in the VM.
HTH
Eliot
HTH,
Rob
FIRST STACK DUMP
vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 30247
vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]#
Received user signal, printing active stack:
0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage
0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class
0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph
0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph
0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage
0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage
0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState
0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState
0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph
0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState
0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState
0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph
0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class
-1133485544 s [] in
Segmentation fault
Smalltalk stack dump:
0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage
0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class
0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph
0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph
0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage
0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage
0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState
0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState
0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph
0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState
0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState
0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph
0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class
SECOND STACK DUMP
vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 7340
Received user signal, printing active stack:
vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]# 0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage
0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class
0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph
0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph
0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage
0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage
0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState
0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState
0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph
0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState
0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState
0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph
0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class
-1134099944 s [] in BlockClosure>newProcess
Received user signal, printing all processes:
Process 0xbc670278 priority 40
0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage
0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class
0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph
0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph
0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage
0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage
0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState
0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState
0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph
0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState
0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState
0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph
0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class
-1134099944 s [] in BlockClosure>newProcess
Process 0xbc97fc44 priority 50
0xffafa4c0 M WeakArray class>finalizationProcess -1210174624: a(n) WeakArray class
0xffafa4e0 I [] in WeakArray class>restartFinalizationProcess -1210174624: a(n) WeakArray class
0xffafa500 I [] in BlockClosure>newProcess -1130890396: a(n) BlockClosure
Process 0xb8125518 priority 80
widowed caller frame
EventualProcess 0xbbbd6e10 priority 60
-1134095516 s [] in Delay>wait
-1134049404 s BlockClosure>ifCurtailed:
-1134095648 s Delay>wait
-1134049312 s [] in VatTPManager class>finalizationLoop
-1145210696 s BlockClosure>repeat
-1145213336 s VatTPManager class>finalizationLoop
-1145213520 s [] in VatTPManager class>?
EventualProcess 0xbc5c9174 priority 30
-1134783048 s SharedQueue>next
-1134783140 s [] in Vat>processSends
-1134751716 s BlockClosure>ifCurtailed:
-1134783276 s Vat>processSends
-1134783984 s [] in EventualProcess>setupContext
Process 0xbc86c01c priority 60
0xffaf44c0 I RFBEventSensor(InputSensor)>userInterruptWatcher -1130865472: a(n) RFBEventSensor
0xffaf44e0 I [] in RFBEventSensor(InputSensor)>installInterruptWatcher -1130865472: a(n) RFBEventSensor
0xffaf4500 I [] in BlockClosure>newProcess -1132019908: a(n) BlockClosure
Process 0xbc86c1dc priority 60
widowed caller frame 8TÅSĸ"Ä»r·Zžð·ð·Zžð·ļ§r·TTÅSĸÅr·ĻZžZžÃ,ž4Zžir·tTÅSĸT·Ã,žüYžÃ,žÄ?\žð··TÅSĸĶ·sZžðZžð·8·s·Ä?TÅSĸ(õq·$ZžĪ*îZžüYžÃ,žÄ?\ž ]Ä'·
Process 0xbc86c3c8 priority 60
0xffaf74c0 I SmalltalkImage>lowSpaceWatcher -1207897132: a(n) SmalltalkImage
0xffaf74e0 I [] in SmalltalkImage>installLowSpaceWatcher -1207897132: a(n) SmalltalkImage
0xffaf7500 I [] in BlockClosure>newProcess -1132018968: a(n) BlockClosure
Process 0xbc985ef0 priority 60
widowed caller frame HÃ"ÅSĸúq·Ä?\žð·ð·Ä?\žð· úq·lÃ"ÅSĸ(õq·Ã"\žÄ?\žD\žļ·
Segmentation fault
Can't dump Smalltalk stack. Not in VM thread
Most recent primitives
wait
signal
millisecondClockValue
wait
signal
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
perform:with:
basicNew:
basicNew
value:
millisecondClockValue
basicNew
basicNew
new:
at:put:
at:put:
at:put:
basicNew
basicNew
basicNew
basicNew:
at:put:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
basicNew:
at:put:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
species
basicNew:
replaceFrom:to:with:startingAt:
compare:with:collated:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
perform:withArguments:
perform:
species
basicNew:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
primitiveGarbageCollect
millisecondClockValue
signal
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
suspend
primitiveResume
at:put:
at:put:
at:put:
at:put:
suspend
primitiveResume
at:put:
at:put:
primSignal:atMilliseconds:
millisecondClockValue
wait
millisecondClockValue
millisecondClockValue
wait
signal
at:put:
at:put:
millisecondClockValue
primSignal:atMilliseconds:
millisecondClockValue
wait
value
wait
signal
wait
value
signal
millisecondClockValue
primSignal:atMilliseconds:
millisecondClockValue
wait
signal
primSocketConnectionStatus:
millisecondClockValue
basicNew:
byteAt:put:
byteAt:put:
species
basicNew:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
species
basicNew:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
basicNew
findNextHandlerContextStarting
tempAt:
tempAt:
tempAt:put:
valueNoContextSwitch
tempAt:
valueWithArguments:
findNextUnwindContextUpTo:
tempAt:
tempAt:put:
tempAt:
terminateTo:
value
tempAt:put:
findNextUnwindContextUpTo:
terminateTo:
primSocketConnectionStatus:
value
value
millisecondClockValue
primSocketConnectionStatus:
millisecondClockValue
millisecondClockValue
basicNew
valueNoContextSwitch
millisecondClockValue
wait
signal
at:put:
at:put:
at:put:
millisecondClockValue
primSignal:atMilliseconds:
millisecondClockValue
wait
signal
wait
basicNew
new:
someInstance
nextInstance
at:put:
species
new:
replaceFrom:to:with:startingAt:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
perform:withArguments:
perform:
species
basicNew:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
primitiveGarbageCollect
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20100721/23bc9ea0/attachment-0001.htm
More information about the Vm-dev
mailing list