Hi Eliot,
 
(I forgot to CC the mailing list - added)
 
I made a few things happen.
 
First, I found that the argument to -leakcheck is an integer that gets masked to determine whether to leak check an incremental or full GC.   I made the call with '-leakcheck 7'.
 
Second, I added a -leakcheck section to the COGVM section:
 
#if COGVM
      else if (!strcmp(argv[0], "-leakcheck")) {
  extern sqInt checkForLeaks;
  checkForLeaks = atoi(argv[1]); 
  return 2; }
I compiled and ran it.  I am unsure where any output from the leak checker goes.  If it is to stdout or stderr, I forget the magic incantation to redirect these to files.  I think it is '2> stderr.txt 1> stdout.txt' for /bin/sh.  Is that right?
 
So when I ran it, it runs (the new image with stepping button bar - takes 30% cpu).  When I send 'kill -USR1 <pid>' it seg faults guaranteed.  This may or may not be the original seg fault - it may be the leakchecker?
 
The only stuff I am doing that calls out of the image is socket stuff.  This may or may not be in the middle of a call when it seg faults.  I will work to turn off all the socket activity and see if it still seg faults.
 
Am I activating the leakchecker ok?
 
Regards,
Rob

From: Rob Withers
Sent: Monday, July 19, 2010 9:22 PM
To: Eliot Miranda
Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux

hey Eliot,
 
It looks like this command line argument, -leakcheck, is for the STACKVM, not the COGVM.   Is this an issue?
 
Thanks,
Rob
 
#if STACKVM
      else if (!strcmp(argv[0], "-eden")) {
  extern sqInt desiredEdenBytes;
  desiredEdenBytes = strtobkm(argv[1]);
  return 2; }
      else if (!strcmp(argv[0], "-leakcheck")) {
  extern sqInt checkForLeaks;
  checkForLeaks = atoi(argv[1]); 
  return 2; }
      else if (!strcmp(argv[0], "-stackpages")) {
  extern sqInt desiredNumStackPages;
  desiredNumStackPages = atoi(argv[1]);
  return 2; }
      else if (!strcmp(argv[0], "-breaksel")) {
  extern void setBreakSelector(char *);
  setBreakSelector(argv[1]);
  return 2; }
      else if (!strcmp(argv[0], "-noheartbeat")) {
  extern sqInt suppressHeartbeatFlag;
  suppressHeartbeatFlag = 1;
  return 1; }
#endif /* STACKVM */

From: Rob Withers
Sent: Monday, July 19, 2010 9:19 PM
To: Eliot Miranda
Cc: Squeak VM Dev
Subject: [Vm-dev] Re: Cog segmentation fault on Linux


Hi Eliot,
 
Got home from my new job and started looking into this.  It turns out that this morning I found that I had a button bar that was stepping and part of the step was a Smalltalk garbageCollect to force collection before checking for instances.  It may be something I don't need to do anymore, however it helps expose this seg fault.  Both stack dumps were in the garbageCollect.  I removed the button bar, uploaded the image, and ran it.  CPU% dropped from 33% to 2%.  I let it run all day.  At some point it exited, for an unknown reason, as it was gone when I returned tonight.
 
I have reinstated the button bar, to help this bug occur, and uploaded it to the server. 
 
Now I just need to enable -leakcheck.  From sqUnixMain.c it looks like it takes an argument.  What is that argument?
 
Thanks,
Rob
 

From: Eliot Miranda
Sent: Monday, July 19, 2010 1:47 PM
To: Rob Withers
Cc: Squeak Virtual Machine Development Discussion
Subject: Re: Cog segmentation fault on Linux

Hi Rob,

On Mon, Jul 19, 2010 at 3:10 AM, Rob Withers <reefedjib@yahoo.com> wrote:
Eliot,

I am getting a segmentation fault running Cog headless on linux.  Here is the stack dump.  Below is a second stack dump that looks different.

While the heap corruption might be a bug in Cog it might also be heap corruption from external code (e.g. objects passed through FFI calls to external code that overwrites those objects' bounds).

There's a leak checker in Cog (see the -leakcheck argument in platforms/unix/vm/sqUnixMain.c) that can help you localise this.  Its best to distrust your code before you distrust the VM, simply because thinking it's the VM can blind-side you to potential bugs in your own code or other parts of the system.  The goal here is a reproducible case.  If you get a reproducible case that doesn't use any external code then the bug is in the VM.

HTH
Eliot


HTH,
Rob

FIRST STACK DUMP

vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 30247
vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]#
Received user signal, printing active stack:

0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage
0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class
0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph
0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph
0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage
0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage
0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState
0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState
0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph
0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState
0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState
0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph
0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class
-1133485544 s [] in
Segmentation fault



Smalltalk stack dump:
0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage
0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class
0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph
0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph
0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage
0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage
0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState
0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState
0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph
0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState
0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState
0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph
0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class


SECOND STACK DUMP

vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 7340

Received user signal, printing active stack:

vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]# 0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage
0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class
0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph
0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph
0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage
0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage
0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState
0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState
0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph
0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState
0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState
0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph
0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class
-1134099944 s [] in BlockClosure>newProcess

Received user signal, printing all processes:

Process 0xbc670278 priority 40
0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage
0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class
0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph
0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph
0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage
0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage
0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState
0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState
0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph
0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState
0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState
0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph
0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class
-1134099944 s [] in BlockClosure>newProcess

Process 0xbc97fc44 priority 50
0xffafa4c0 M WeakArray class>finalizationProcess -1210174624: a(n) WeakArray class
0xffafa4e0 I [] in WeakArray class>restartFinalizationProcess -1210174624: a(n) WeakArray class
0xffafa500 I [] in BlockClosure>newProcess -1130890396: a(n) BlockClosure

Process 0xb8125518 priority 80
widowed caller frame

EventualProcess 0xbbbd6e10 priority 60
-1134095516 s [] in Delay>wait
-1134049404 s BlockClosure>ifCurtailed:
-1134095648 s Delay>wait
-1134049312 s [] in VatTPManager class>finalizationLoop
-1145210696 s BlockClosure>repeat
-1145213336 s VatTPManager class>finalizationLoop
-1145213520 s [] in VatTPManager class>?

EventualProcess 0xbc5c9174 priority 30
-1134783048 s SharedQueue>next
-1134783140 s [] in Vat>processSends
-1134751716 s BlockClosure>ifCurtailed:
-1134783276 s Vat>processSends
-1134783984 s [] in EventualProcess>setupContext

Process 0xbc86c01c priority 60
0xffaf44c0 I RFBEventSensor(InputSensor)>userInterruptWatcher -1130865472: a(n) RFBEventSensor
0xffaf44e0 I [] in RFBEventSensor(InputSensor)>installInterruptWatcher -1130865472: a(n) RFBEventSensor
0xffaf4500 I [] in BlockClosure>newProcess -1132019908: a(n) BlockClosure

Process 0xbc86c1dc priority 60
widowed caller frame 8TŊĸ"Ļr·Zžð·ð·Zžð·ļ§r·TTŊĸōr·ĻZžZžÂž4Zžir·tTŊĸT·ÂžüYžÂžĀ\žð··TŊĸĶ·sZžðZžð·8·s·ĀTŊĸ(õq·$ZžĪ*îZžüYžÂžĀ\ž ]đ·

Process 0xbc86c3c8 priority 60
0xffaf74c0 I SmalltalkImage>lowSpaceWatcher -1207897132: a(n) SmalltalkImage
0xffaf74e0 I [] in SmalltalkImage>installLowSpaceWatcher -1207897132: a(n) SmalltalkImage
0xffaf7500 I [] in BlockClosure>newProcess -1132018968: a(n) BlockClosure

Process 0xbc985ef0 priority 60
widowed caller frame HÓŊĸúq·Ā\žð·ð·Ā\žð· úq·lÓŊĸ(õq·Ô\žĀ\žD\žļ·

Segmentation fault


Can't dump Smalltalk stack. Not in VM thread

Most recent primitives
wait
signal
millisecondClockValue
wait
signal
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
perform:with:
basicNew:
basicNew
value:
millisecondClockValue
basicNew
basicNew
new:
at:put:
at:put:
at:put:
basicNew
basicNew
basicNew
basicNew:
at:put:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
basicNew:
at:put:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
species
basicNew:
replaceFrom:to:with:startingAt:
compare:with:collated:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
perform:withArguments:
perform:
species
basicNew:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
primitiveGarbageCollect
millisecondClockValue
signal
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
suspend
primitiveResume
at:put:
at:put:
at:put:
at:put:
suspend
primitiveResume
at:put:
at:put:
primSignal:atMilliseconds:
millisecondClockValue
wait
millisecondClockValue
millisecondClockValue
wait
signal
at:put:
at:put:
millisecondClockValue
primSignal:atMilliseconds:
millisecondClockValue
wait
value
wait
signal
wait
value
signal
millisecondClockValue
primSignal:atMilliseconds:
millisecondClockValue
wait
signal
primSocketConnectionStatus:
millisecondClockValue
basicNew:
byteAt:put:
byteAt:put:
species
basicNew:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
species
basicNew:
replaceFrom:to:with:startingAt:
replaceFrom:to:with:startingAt:
basicNew
findNextHandlerContextStarting
tempAt:
tempAt:
tempAt:put:
valueNoContextSwitch
tempAt:
valueWithArguments:
findNextUnwindContextUpTo:
tempAt:
tempAt:put:
tempAt:
terminateTo:
value
tempAt:put:
findNextUnwindContextUpTo:
terminateTo:
primSocketConnectionStatus:
value
value
millisecondClockValue
primSocketConnectionStatus:
millisecondClockValue
millisecondClockValue
basicNew
valueNoContextSwitch
millisecondClockValue
wait
signal
at:put:
at:put:
at:put:
millisecondClockValue
primSignal:atMilliseconds:
millisecondClockValue
wait
signal
wait
basicNew
new:
someInstance
nextInstance
at:put:
species
new:
replaceFrom:to:with:startingAt:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
at:put:
perform:withArguments:
perform:
species
basicNew:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
species
basicNew:
basicAt:put:
species
basicNew:
basicReplaceFrom:to:with:startingAt:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
new:
basicNew
at:put:
at:put:
at:put:
primitiveGarbageCollect