[Vm-dev] Re: Cog segmentation fault on Linux

Rob Withers reefedjib at yahoo.com
Wed Jul 21 10:58:07 UTC 2010


Eliot,

I am trying to narrow down what may be causing this.  I took my looping GC image and shut down more processes where I could, including the eventTickler. My script is:

    Sensor shutDown.
    VatTPManager stop.
    [[Smalltalk garbageCollect] repeat] fork.

My Processes are:

    "timerEventLoop Process - priority 80"
    "lowSpaceWatcher Process -priority 60"
    "finalization Process - priority 50"
    "UIProcess - priority 40"
    "user Process - priority 40 - [[Smalltalk garbageCollect] repeat] fork."
    "idle Process - priority 10"

as always, the active stack on SegFault is the user Process doing a Smalltalk garbageCollect.

At this point I deciding to try and see what gdb would tell me.  I ran the following command:

    gdb --args lib/squeak/3.9-7/squeak -leakcheck 7 -vm-display-null -vm-sound-null echat-server-off.image garbageCollect.sq

It loaded symbols.  I then issued the 'run' command.  It runs squeak, but the resulting process doesn't accumulate cputime, like it isn't really running.  I tried sending a USR1 to that pid, but it doesn't output anything.  I issued the run command again but gdb seems to think it is running.  ps aux also thinks this as there is an entry for squeak.  It just isn't doing anything.   Am I using gdb wrong?  Is the stack paused?  Here is the results of 'bt':

(gdb) bt
#0  0xf7ffd430 in __kernel_vsyscall ()
#1  0x4b3c22f6 in nanosleep () from /lib/libpthread.so.0
#2  0x0805c958 in tickerSleepCycle (ignored=0x0)
    at /home1/vawhigso/public_html/squeakelib/Cog/platforms/unix/vm/sqUnixHeartbeat.c:375
#3  0x4b3ba832 in start_thread () from /lib/libpthread.so.0
#4  0x4b315e0e in clone () from /lib/libc.so.6

Thanks for any help,
Rob



From: Rob Withers 
Sent: Tuesday, July 20, 2010 6:14 AM
To: Eliot Miranda 
Cc: Squeak VM Dev 
Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux




--------------------------------------------------------------------------------


Hey Eliot,

Here is what I have found.  I never saw any output from the leak checker.  I was able to generate seg faults in the original echat-server.image, which is doing socket stuff, AND I was able to generate it in a looping GC image.  In the original echat-server image, I have a listening socket and I have a "Vat" which has installed a subclass of Process and is looping and I am running the RFB server.  In the looping GC image, I turned off my listening socket, the Vat is not running and I stopped the RFB server.  I run headless and I supply a script to run.  I supplied the following script:

[Smalltalk garbageCollect] repeat.

It took a few attempts (8 attempts) but I eventually seg faulted.

I have attached the logfiles for both the echat-server scenario and the looping GC scenario.   Search for #SIGUSR1 for each process dump section.  Search for #SEGFAULT to find the section at the bottom that seg faulted.  Search for #PREVSTACK to find Processes in the SEGFAULT sections that have garbage in them and what the corresponding stack in a previous healthy section was doing.

Note that of these corrupted Processes, #PREVSTACK (Delay class>handleTimerEvent) and #PREVSTACK (EventSensor>eventTickler) are bad in both scenarios.

HTH,
Rob


From: Rob Withers 
Sent: Tuesday, July 20, 2010 4:51 AM
To: Eliot Miranda 
Cc: Squeak VM Dev 
Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux




--------------------------------------------------------------------------------


Hi Eliot,

(I forgot to CC the mailing list - added)

I made a few things happen.

First, I found that the argument to -leakcheck is an integer that gets masked to determine whether to leak check an incremental or full GC.   I made the call with '-leakcheck 7'.

Second, I added a -leakcheck section to the COGVM section:

#if COGVM
      else if (!strcmp(argv[0], "-leakcheck")) { 
  extern sqInt checkForLeaks;
  checkForLeaks = atoi(argv[1]);  
  return 2; }

I compiled and ran it.  I am unsure where any output from the leak checker goes.  If it is to stdout or stderr, I forget the magic incantation to redirect these to files.  I think it is '2> stderr.txt 1> stdout.txt' for /bin/sh.  Is that right?

So when I ran it, it runs (the new image with stepping button bar - takes 30% cpu).  When I send 'kill -USR1 <pid>' it seg faults guaranteed.  This may or may not be the original seg fault - it may be the leakchecker?

The only stuff I am doing that calls out of the image is socket stuff.  This may or may not be in the middle of a call when it seg faults.  I will work to turn off all the socket activity and see if it still seg faults.

Am I activating the leakchecker ok?

Regards,
Rob


From: Rob Withers 
Sent: Monday, July 19, 2010 9:22 PM
To: Eliot Miranda 
Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux


hey Eliot,

It looks like this command line argument, -leakcheck, is for the STACKVM, not the COGVM.   Is this an issue?

Thanks,
Rob

#if STACKVM
      else if (!strcmp(argv[0], "-eden")) {
  extern sqInt desiredEdenBytes;
  desiredEdenBytes = strtobkm(argv[1]);
  return 2; }
      else if (!strcmp(argv[0], "-leakcheck")) { 
  extern sqInt checkForLeaks;
  checkForLeaks = atoi(argv[1]);  
  return 2; }
      else if (!strcmp(argv[0], "-stackpages")) {
  extern sqInt desiredNumStackPages;
  desiredNumStackPages = atoi(argv[1]);
  return 2; }
      else if (!strcmp(argv[0], "-breaksel")) { 
  extern void setBreakSelector(char *);
  setBreakSelector(argv[1]);
  return 2; }
      else if (!strcmp(argv[0], "-noheartbeat")) { 
  extern sqInt suppressHeartbeatFlag;
  suppressHeartbeatFlag = 1;
  return 1; }
#endif /* STACKVM */



From: Rob Withers 
Sent: Monday, July 19, 2010 9:19 PM
To: Eliot Miranda 
Cc: Squeak VM Dev 
Subject: [Vm-dev] Re: Cog segmentation fault on Linux




--------------------------------------------------------------------------------


Hi Eliot,

Got home from my new job and started looking into this.  It turns out that this morning I found that I had a button bar that was stepping and part of the step was a Smalltalk garbageCollect to force collection before checking for instances.  It may be something I don't need to do anymore, however it helps expose this seg fault.  Both stack dumps were in the garbageCollect.  I removed the button bar, uploaded the image, and ran it.  CPU% dropped from 33% to 2%.  I let it run all day.  At some point it exited, for an unknown reason, as it was gone when I returned tonight.

I have reinstated the button bar, to help this bug occur, and uploaded it to the server.  

Now I just need to enable -leakcheck.  From sqUnixMain.c it looks like it takes an argument.  What is that argument?

Thanks,
Rob



From: Eliot Miranda 
Sent: Monday, July 19, 2010 1:47 PM
To: Rob Withers 
Cc: Squeak Virtual Machine Development Discussion 
Subject: Re: Cog segmentation fault on Linux


Hi Rob,


On Mon, Jul 19, 2010 at 3:10 AM, Rob Withers <reefedjib at yahoo.com> wrote:

  Eliot,

  I am getting a segmentation fault running Cog headless on linux.  Here is the stack dump.  Below is a second stack dump that looks different.



While the heap corruption might be a bug in Cog it might also be heap corruption from external code (e.g. objects passed through FFI calls to external code that overwrites those objects' bounds).


There's a leak checker in Cog (see the -leakcheck argument in platforms/unix/vm/sqUnixMain.c) that can help you localise this.  Its best to distrust your code before you distrust the VM, simply because thinking it's the VM can blind-side you to potential bugs in your own code or other parts of the system.  The goal here is a reproducible case.  If you get a reproducible case that doesn't use any external code then the bug is in the VM.


HTH
Eliot




  HTH,
  Rob

  FIRST STACK DUMP

  vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 30247
  vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]#
  Received user signal, printing active stack:

  0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage
  0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class
  0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph
  0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph
  0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage
  0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage
  0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState
  0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState
  0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph
  0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState
  0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState
  0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph
  0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class
  -1133485544 s [] in
  Segmentation fault



  Smalltalk stack dump:
  0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage
  0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class
  0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph
  0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph
  0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage
  0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage
  0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState
  0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState
  0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph
  0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState
  0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState
  0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph
  0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class


  SECOND STACK DUMP

  vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 7340

  Received user signal, printing active stack:

  vawhigso at vawhigs.org [~/public_html/squeakelib/Cog]# 0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage
  0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class
  0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph
  0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph
  0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage
  0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage
  0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState
  0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState
  0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph
  0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState
  0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState
  0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph
  0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class
  -1134099944 s [] in BlockClosure>newProcess

  Received user signal, printing all processes:

  Process 0xbc670278 priority 40
  0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage
  0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class
  0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph
  0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph
  0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage
  0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage
  0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState
  0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState
  0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph
  0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState
  0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState
  0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph
  0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class
  -1134099944 s [] in BlockClosure>newProcess

  Process 0xbc97fc44 priority 50
  0xffafa4c0 M WeakArray class>finalizationProcess -1210174624: a(n) WeakArray class
  0xffafa4e0 I [] in WeakArray class>restartFinalizationProcess -1210174624: a(n) WeakArray class
  0xffafa500 I [] in BlockClosure>newProcess -1130890396: a(n) BlockClosure

  Process 0xb8125518 priority 80
  widowed caller frame

  EventualProcess 0xbbbd6e10 priority 60
  -1134095516 s [] in Delay>wait
  -1134049404 s BlockClosure>ifCurtailed:
  -1134095648 s Delay>wait
  -1134049312 s [] in VatTPManager class>finalizationLoop
  -1145210696 s BlockClosure>repeat
  -1145213336 s VatTPManager class>finalizationLoop
  -1145213520 s [] in VatTPManager class>?

  EventualProcess 0xbc5c9174 priority 30
  -1134783048 s SharedQueue>next
  -1134783140 s [] in Vat>processSends
  -1134751716 s BlockClosure>ifCurtailed:
  -1134783276 s Vat>processSends
  -1134783984 s [] in EventualProcess>setupContext

  Process 0xbc86c01c priority 60
  0xffaf44c0 I RFBEventSensor(InputSensor)>userInterruptWatcher -1130865472: a(n) RFBEventSensor
  0xffaf44e0 I [] in RFBEventSensor(InputSensor)>installInterruptWatcher -1130865472: a(n) RFBEventSensor
  0xffaf4500 I [] in BlockClosure>newProcess -1132019908: a(n) BlockClosure

  Process 0xbc86c1dc priority 60
  widowed caller frame 8TÅSĸ"Ä»r·Zžð·ð·Zžð·ļ§r·TTÅSĸōr·ĻZžZžÃ,ž4Zžir·tTÅSĸT·Ã,žüYžÃ,žÄ?\žð··TÅSĸĶ·sZžðZžð·8·s·Ä?TÅSĸ(õq·$ZžĪ*îZžüYžÃ,žÄ?\ž ]Ä'·

  Process 0xbc86c3c8 priority 60
  0xffaf74c0 I SmalltalkImage>lowSpaceWatcher -1207897132: a(n) SmalltalkImage
  0xffaf74e0 I [] in SmalltalkImage>installLowSpaceWatcher -1207897132: a(n) SmalltalkImage
  0xffaf7500 I [] in BlockClosure>newProcess -1132018968: a(n) BlockClosure

  Process 0xbc985ef0 priority 60
  widowed caller frame HÃ"ÅSĸúq·Ä?\žð·ð·Ä?\žð· úq·lÃ"ÅSĸ(õq·Ã"\žÄ?\žD\žļ·

  Segmentation fault


  Can't dump Smalltalk stack. Not in VM thread

  Most recent primitives
  wait
  signal
  millisecondClockValue
  wait
  signal
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  perform:with:
  basicNew:
  basicNew
  value:
  millisecondClockValue
  basicNew
  basicNew
  new:
  at:put:
  at:put:
  at:put:
  basicNew
  basicNew
  basicNew
  basicNew:
  at:put:
  replaceFrom:to:with:startingAt:
  replaceFrom:to:with:startingAt:
  basicNew:
  at:put:
  replaceFrom:to:with:startingAt:
  replaceFrom:to:with:startingAt:
  species
  basicNew:
  replaceFrom:to:with:startingAt:
  compare:with:collated:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  perform:withArguments:
  perform:
  species
  basicNew:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  species
  basicNew:
  basicReplaceFrom:to:with:startingAt:
  species
  basicNew:
  basicAt:put:
  basicAt:put:
  species
  basicNew:
  basicReplaceFrom:to:with:startingAt:
  species
  basicNew:
  basicAt:put:
  species
  basicNew:
  basicReplaceFrom:to:with:startingAt:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  primitiveGarbageCollect
  millisecondClockValue
  signal
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  suspend
  primitiveResume
  at:put:
  at:put:
  at:put:
  at:put:
  suspend
  primitiveResume
  at:put:
  at:put:
  primSignal:atMilliseconds:
  millisecondClockValue
  wait
  millisecondClockValue
  millisecondClockValue
  wait
  signal
  at:put:
  at:put:
  millisecondClockValue
  primSignal:atMilliseconds:
  millisecondClockValue
  wait
  value
  wait
  signal
  wait
  value
  signal
  millisecondClockValue
  primSignal:atMilliseconds:
  millisecondClockValue
  wait
  signal
  primSocketConnectionStatus:
  millisecondClockValue
  basicNew:
  byteAt:put:
  byteAt:put:
  species
  basicNew:
  replaceFrom:to:with:startingAt:
  replaceFrom:to:with:startingAt:
  species
  basicNew:
  replaceFrom:to:with:startingAt:
  replaceFrom:to:with:startingAt:
  basicNew
  findNextHandlerContextStarting
  tempAt:
  tempAt:
  tempAt:put:
  valueNoContextSwitch
  tempAt:
  valueWithArguments:
  findNextUnwindContextUpTo:
  tempAt:
  tempAt:put:
  tempAt:
  terminateTo:
  value
  tempAt:put:
  findNextUnwindContextUpTo:
  terminateTo:
  primSocketConnectionStatus:
  value
  value
  millisecondClockValue
  primSocketConnectionStatus:
  millisecondClockValue
  millisecondClockValue
  basicNew
  valueNoContextSwitch
  millisecondClockValue
  wait
  signal
  at:put:
  at:put:
  at:put:
  millisecondClockValue
  primSignal:atMilliseconds:
  millisecondClockValue
  wait
  signal
  wait
  basicNew
  new:
  someInstance
  nextInstance
  at:put:
  species
  new:
  replaceFrom:to:with:startingAt:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  at:put:
  perform:withArguments:
  perform:
  species
  basicNew:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  basicAt:put:
  species
  basicNew:
  basicReplaceFrom:to:with:startingAt:
  species
  basicNew:
  basicAt:put:
  basicAt:put:
  species
  basicNew:
  basicReplaceFrom:to:with:startingAt:
  species
  basicNew:
  basicAt:put:
  species
  basicNew:
  basicReplaceFrom:to:with:startingAt:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  new:
  basicNew
  at:put:
  at:put:
  at:put:
  primitiveGarbageCollect



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20100721/23bc9ea0/attachment-0001.htm


More information about the Vm-dev mailing list