An analysis of interrupt behavior (was: Re: 6293LowSpaceWatcherFix-dtl considered harmful)

David T. Lewis lewis at mail.msen.com
Sat Apr 2 18:36:47 UTC 2005


On Fri, Apr 01, 2005 at 06:44:35AM -0500, David T. Lewis wrote:
> Thanks Andreas,
> 
> I will run some tests this weekend and see if I can confirm which
> process is being interrupted and why. One additional data point I
> can add is that the condition is 100% repeatable. That is, with
> the combination of Unix VM + fixed memory allocation + Morphic
> + original #interruptName method + runaway recursion test, the
> result will always be a VM crash. This should make it much easer
> to debug.
> 
> One additional hypothesis: If you were to disable the "grow
> object memory" capability within a Win32 VM, I would expect to
> see the same VM crash behavior.
> 
> I'll post any results by this Sunday.

As promised, here are some test results for the low memory handler problem.
All tests were run on a Squeak 3.7 image with Unix VM set for fixed size
object memory to force the out-of-memory condition. I used OSProcess and
output tracing to the console in order to see what is happening in the
interrupt handlers. The bad fix for #interruptName: is *not* applied in
any of these tests, so this reflects behavior of the system prior to
applying the bad fix. Some of the results are rather self-evident, but
are included for completeness. For reference, I've also attached a change
set with the hacks that I used for debugging.

Bottom line for the impatient: Andreas' analysis is correct, modulo a
few details pertaining to MVC.

In Morphic, if I terminate the event tickler and start a dummyTickler
process instead, I get a runaway overflow (VM crash) just as if the event
tickler were running. If there is no event tickler, then no crash occurs.

  dummyTickler _ [
    [true] whileTrue: [(Delay forMilliseconds: 500) wait]
  ] forkAt:  Processor lowIOPriority.
  Smalltalk createStackOverflow

If I do the same thing, but give the dummyTickler a longer time delay,
then there is no failure. Presumably the dummyTickler was not ready to
be scheduled, hence did not interfere with low space handling. This
further supports the hypothesis that the event tickler is being
incorrectly treated as the runaway process.

  dummyTickler _ [
    [true] whileTrue: [(Delay forMilliseconds: 50000) wait]
  ] forkAt:  Processor lowIOPriority.
  Smalltalk createStackOverflow

In MVC, the following does *not* result in a VM crash. The low space
watcher works fine despite having this process running. Also, if the
event tickler process is terminated, this still works properly in MVC.
In other words, in MVC we *always* handle the low space semaphore
correctly (but see below), regardless of whether the event tickler or
the dummyTickler (or both) is running.

  dummyTickler _ [
    [true] whileTrue: [(Delay forMilliseconds: 500) wait]
  ] forkAt:  Processor lowIOPriority.
  Smalltalk createStackOverflow

Note however, in MVC if the stack overflow occurs in a background process,
things go horribly wrong in ways I will not even try to describe.

  [Smalltalk createStackOverflow] fork

I used OSProcess output tracing to identify the prempted process at the
time of low space interrupt handling. In *both* Morphic and MVC, the event
tickler is the preempted process (when the event tickler is running of
course). See console trace logs below for the actual output.

This seems to bring us right back to the implementation of #interruptName:.
There is an MVC implementation in ControlManager>>interruptName:, and
a Morphic implementation in Project class>>interruptName:. The MVC
implementation apparently handles the low space interrupt correctly,
and the Morphic implementation does not.

What is different? The MVC implemention does not make any reference to
the preempted process when figuring out which process to suspend. Instead
it always suspends the ScheduledControllers activeControllerProcess process
(see below, possibly this is why in MVC we cannot interrupt a high priority
background process).

All of this supports Andreas' hypothesis that, when one process triggers
the low memory semaphore, the higher priority event tickler process has
become runnable by the time garbage collection completes and the semaphore
has been signalled.

So this still leaves us with the problem of how to figure out which process
actually was active at the time the low space interrupt was generated.

Just for completeness, in Morphic, the following *does not* hang the image,
but in MVC it *does* hang the image.

  [[true] whileTrue:[]] forkAt: Processor userSchedulingPriority+1.

So MVC apparently does not know how to handle either the low memory
semaphore or the user interrupt semaphore if the offending process is
running in background, separate from the main UI controller scheduling.

Following are copies of the console output obtained when running
#createStackOverflow in both Morphic and MVC. In both cases, the
#createStackOverflow method has been modified to produce console output,
and to force termination if the method has been called more than 10
times after the low space semaphore was signalled.
For each line in the trace output, the first number identifies the
active process (the second number identifies the message receiver).

In Morphic the console output is:

  lewis at dtlewis:~/squeak/squeak3.7> squeak -memory 7m squeak.7
  1319:3840:UndefinedObject>>DoIt:about to run createStackOverflow from a workspace
  430:3864:SystemDictionary>>lowSpaceWatcher:low space semaphore signal received, preempted process is
   [] in EventSensor>>eventTickler {[delay wait.  delta := Time millisecondClockValue - lastEventPoll.  (delta <...]}
   BlockContext>>on:do:
   [] in EventSensor>>installEventTickler {[self eventTickler]}
   [] in BlockContext>>newProcess {[self value.  Processor terminateActive]}
  
  430:3864:SystemDictionary>>lowSpaceWatcher:about to display low space notifier
  430:2445:Project class>>interruptName::entering #interruptName, preempted process is
   [] in EventSensor>>eventTickler {[delay wait.  delta := Time millisecondClockValue - lastEventPoll.  (delta <...]}
   BlockContext>>on:do:
   [] in EventSensor>>installEventTickler {[self eventTickler]}
   [] in BlockContext>>newProcess {[self value.  Processor terminateActive]}
  
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 0
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 1
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 2
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 3
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 4
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 5
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 6
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 7
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 8
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 9
  1319:3864:SystemDictionary>>createStackOverflow::low space signal already received, continuing with recursion depth: 10
  1319:3864:SystemDictionary>>createStackOverflow::terminate recursion at depth: 10
  3886:3864:SystemDictionary>>lowSpaceWatcher:restarted low space watcher
  3886:3864:SystemDictionary>>lowSpaceWatcher:install low space semaphore
  3886:3864:SystemDictionary>>lowSpaceWatcher:enable low space interrupts
  3886:3864:SystemDictionary>>lowSpaceWatcher:wait on low space semaphore
  2954:3864:SystemDictionary>>lowSpaceWatcher:restarted low space watcher
  2954:3864:SystemDictionary>>lowSpaceWatcher:install low space semaphore
  2954:3864:SystemDictionary>>lowSpaceWatcher:enable low space interrupts
  2954:3864:SystemDictionary>>lowSpaceWatcher:wait on low space semaphore


In MVC the console output is:

  lewis at dtlewis:~/squeak/squeak3.7> squeak -memory 7m squeak.7
  1898:3840:UndefinedObject>>DoIt:about to run createStackOverflow from a workspace
  2069:3864:SystemDictionary>>lowSpaceWatcher:low space semaphore signal received, preempted process is
   [] in EventSensor>>eventTickler {[delay wait.  delta := Time millisecondClockValue - lastEventPoll.  (delta <...]}
   BlockContext>>on:do:
   [] in EventSensor>>installEventTickler {[self eventTickler]}
   [] in BlockContext>>newProcess {[self value.  Processor terminateActive]}
  
  2069:3864:SystemDictionary>>lowSpaceWatcher:about to display low space notifier
  2069:347:ControlManager>>interruptName::entering #interruptName, preempted process is
   [] in EventSensor>>eventTickler {[delay wait.  delta := Time millisecondClockValue - lastEventPoll.  (delta <...]}
   BlockContext>>on:do:
   [] in EventSensor>>installEventTickler {[self eventTickler]}
   [] in BlockContext>>newProcess {[self value.  Processor terminateActive]}
  
  1794:3864:SystemDictionary>>lowSpaceWatcher:restarted low space watcher
  1794:3864:SystemDictionary>>lowSpaceWatcher:install low space semaphore
  1794:3864:SystemDictionary>>lowSpaceWatcher:enable low space interrupts
  1794:3864:SystemDictionary>>lowSpaceWatcher:wait on low space semaphore

Dave

-------------- next part --------------
'From Squeak3.7 of ''4 September 2004'' [latest update: #5989] on 2 April 2005 at 2:23:50 pm'!
"Change Set:		CheckStackOverflowBug-dtl
Date:			2 April 2005
Author:			David T. Lewis

This change set contains the modifications that I used to do some debugging of low memory notification. It is of no use otherwise, but I'm providing it in case anyone needs to reproduce my results.
"!


!ControlManager methodsFor: 'scheduling' stamp: 'dtl 4/2/2005 13:26'!
interruptName: labelString
	"Create a Notifier on the active scheduling process with the given label. Make the Notifier the active controller."
	| suspendingList newActiveController |
OSProcess trace: 'entering #interruptName, preempted process is '.
OSProcess debugMessage: (Processor preemptedProcess longPrintString copyReplaceTokens: String cr with: String lf).
OSProcess trace: 'suspendingList is ', activeControllerProcess suspendingList asString.
	(suspendingList _ activeControllerProcess suspendingList) == nil
		ifTrue: [act!
 iveControllerProcess == Processor activeProcess
					ifTrue: [
OSProcess trace: 'about to suspend ', activeControllerProcess hash asString, ' ', activeControllerProcess asString.
						activeControllerProcess suspend]]
		ifFalse: [
OSProcess trace: 'about to remove from suspendingList ', activeControllerProcess hash asString, ' ', activeControllerProcess asString.
				suspendingList remove: activeControllerProcess ifAbsent:[].
				activeControllerProcess offList].

	activeController ~~ nil ifTrue: [
		"Carefully de-emphasis the current window."
		activeController view topView deEmphasizeForDebugger].

	newActiveController _
		(Debugger
			openInterrupt: labelString
			onProcess: activeControllerProcess) controller.
	newActiveController centerCursorInView.
	self activeController: newActiveController.
! !


!OSProcess class methodsFor: 'debugging' stamp: 'dtl 4/2/2005 12:32'!
debugMessage: aString
	"Print aString on standard output. The debug message is prefixed with the
	ide!
 ntity of the process in which the method is being evaluated, a!
 nd the
	
identity of the object which received the message. Useful for debugging
	timing or deadlock problems."

	[self thisOSProcess stdOut
		"The process in which the traced message is being evaluated"
		nextPutAll: Processor activeProcess hash printString, ':';
		"The identity of the object being traced"
		nextPutAll: thisContext sender sender sender receiver hash printString, ':';
		"The debug message"
		nextPutAll: aString asString; nextPut: Character lf; flush]
			on: Error
			do: []! !

!OSProcess class methodsFor: 'debugging' stamp: 'dtl 4/2/2005 12:32'!
trace
	"Print the sender's context on standard output. The debug message is
	prefixed with the identity of the process in which the method is being
	evaluated, and the identity of the object which received the message.
	Useful for debugging timing or deadlock problems."

	[self thisOSProcess stdOut
		"The process in which the traced message is being evaluated"
		nextPutAll: Processor activeProcess hash printString, ':';
		"Th!
 e identity of the object being traced"
		nextPutAll: thisContext sender sender sender receiver hash printString, ':';
		"The method context describing the method being evaluated"
		nextPutAll: thisContext sender sender sender printString;
		nextPut: Character lf;
		flush]
			on: Error
			do: []! !

!OSProcess class methodsFor: 'debugging' stamp: 'dtl 4/2/2005 12:30'!
trace: debugMessageString
	"Print trace information followed by a debug message"

	[self thisOSProcess stdOut
		"The process in which the traced message is being evaluated"
		nextPutAll: Processor activeProcess hash printString, ':';
		"The identity of the object being traced"
		nextPutAll: thisContext sender sender sender receiver hash printString, ':';
		"The method context describing the method being evaluated"
		nextPutAll: thisContext sender sender sender printString, ':';
		nextPutAll: debugMessageString;
		nextPut: Character lf;
		flush]
			on: Error
			do: []! !


!Project class methodsFor: 'utilities' !
 stamp: 'dtl 4/2/2005 10:36'!
interruptName: labelString
	"Crea!
 te a Not
ifier on the active scheduling process with the given label."
	| preemptedProcess projectProcess suspendingList |
	Smalltalk isMorphic ifFalse:
		[^ ScheduledControllers interruptName: labelString].
OSProcess trace: 'entering #interruptName, preempted process is '.
OSProcess debugMessage: (Processor preemptedProcess longPrintString copyReplaceTokens: String cr with: String lf).
	ActiveHand ifNotNil:[ActiveHand interrupted].
	ActiveWorld _ World. "reinstall active globals"
	ActiveHand _ World primaryHand.
	ActiveHand interrupted. "make sure this one's interrupted too"
	ActiveEvent _ nil.

	projectProcess _ self uiProcess.	"we still need the accessor for a while"
	preemptedProcess _ Processor preemptedProcess.
	"Only debug preempted process if its priority is >= projectProcess' priority"
	preemptedProcess priority < projectProcess priority ifTrue:[
		(suspendingList _ projectProcess suspendingList) == nil
			ifTrue: [projectProcess == Processor activeProcess
						ifTrue: [pro!
 jectProcess suspend]]
			ifFalse: [suspendingList remove: projectProcess ifAbsent: [].
					projectProcess offList].
		preemptedProcess _ projectProcess.
	] ifFalse:[
		preemptedProcess suspend offList.
	].
	Debugger openInterrupt: labelString onProcess: preemptedProcess
! !


!SystemDictionary methodsFor: 'memory space' stamp: 'dtl 5/16/2004 20:16'!
createStackOverflow
	"For testing the low space handler..."
	"Smalltalk installLowSpaceWatcher; createStackOverflow"

	(Smalltalk at: #OutOfMemoryFlag ifAbsentPut: false)
		ifTrue: [self createStackOverflow: nil]
		ifFalse: [self createStackOverflow]
! !

!SystemDictionary methodsFor: 'memory space' stamp: 'dtl 5/20/2004 19:36'!
createStackOverflow: depth
	"For testing the low space handler..."
	"Smalltalk installLowSpaceWatcher; createStackOverflow"

	| d |
	d _ depth ifNil: [0].
	OSProcess trace: 'low space signal already received, continuing with recursion depth: ', d asString.
	(d >= 10)
		ifTrue:
			[OSProcess trace: 'term!
 inate recursion at depth: ', d asString.
			self error: 'runaw!
 ay stack
 overflow']
		ifFalse:
			[self createStackOverflow: d + 1].  "infinite recursion"! !

!SystemDictionary methodsFor: 'memory space' stamp: 'dtl 4/2/2005 11:01'!
lowSpaceWatcher
	"Wait until the low space semaphore is signalled, then take appropriate actions."

	| free |
OSProcess trace: 'restarted low space watcher'.
	self garbageCollectMost <= self lowSpaceThreshold ifTrue: [
		self garbageCollect <= self lowSpaceThreshold ifTrue: [
			"free space must be above threshold before starting low space watcher"
			^ Beeper beep]].

	LowSpaceSemaphore _ Semaphore new.
OSProcess trace: 'install low space semaphore'.
	self primLowSpaceSemaphore: LowSpaceSemaphore.
OSProcess trace: 'enable low space interrupts'.
	self primSignalAtBytesLeft: self lowSpaceThreshold.  "enable low space interrupts"
OSProcess trace: 'wait on low space semaphore'.
Smalltalk at: #OutOfMemoryFlag put: false.
	LowSpaceSemaphore wait.  "wait for a low space condition..."
Smalltalk at: #OutOfMemoryFlag put: tru!
 e.
OSProcess trace: 'low space semaphore signal received, preempted process is '.
OSProcess debugMessage: (Processor preemptedProcess longPrintString copyReplaceTokens: String cr with: String lf).

	self primSignalAtBytesLeft: 0.  "disable low space interrupts"
	self primLowSpaceSemaphore: nil.
	LowSpaceProcess _ nil.
	"Note: user now unprotected until the low space watcher is re-installed"

	self memoryHogs isEmpty ifFalse: [
		free := self bytesLeft.
		self memoryHogs do: [ :hog | hog freeSomeSpace ].
		self bytesLeft > free ifTrue: [ ^ self installLowSpaceWatcher ]].
OSProcess trace: 'about to display low space notifier'.
	Smalltalk isMorphic
			ifTrue: [CurrentProjectRefactoring currentInterruptName: 'Space is low']
			ifFalse: [ScheduledControllers interruptName: 'Space is low']! !


More information about the Squeak-dev mailing list