Process>>terminate woes

Andreas Raab andreas.raab at gmx.de
Wed Dec 5 00:40:07 UTC 2007


Hi -

[cc: vm-dev which I accidentally left out earlier]

Attached my proposed fixes for myList manipulation. The first CS 
(PrimSuspend-ar) changes primitiveSuspend to enable atomic removal from 
myList for the non-active process. The second (SuspendFixes-ar) has the 
main modifications for the in-image part (this may not work for all 
Squeak versions - I used a Croquet image as the basis, YMMV).

Feedback is welcome, in particular from the usual suspects on vm-dev.

Cheers,
   - Andreas

Andreas Raab wrote:
> Hi -
> 
> I had an eventful (which is euphemistic for @!^# up) morning caused by 
> Process>>terminate. In our last round of delay and semaphore discussions 
> I had noticed that there is a possibility of having a race condition in 
> Process>>terminate but dismissed it as being of an application problem 
> (e.g., if you send #terminate make sure you have only one place where 
> you send it).
> 
> This morning proved conclusively that this is a race condition which can 
> affect *every* user of the system. It is caused by Process>>terminate 
> which says:
> 
>     myList remove: self ifAbsent: [].
> 
> The reason this is so problematic is that the modification of myList is 
> not atomic and that because of the non-atomic modification there is a 
> possibility of the VM manipulating the very same list concurrently due 
> to an external event (like a network interrupt). When this happens in 
> "just the right way" the effect is that any number of processes at the 
> same priority will "fall off" of the scheduled list. In the image that I 
> was looking at earlier we had the following situation:
> * ~40 processes were not running
> * The processes had their myList be an empty linked list
> * The processes were internally linked (via nextLink)
> * The processes were all at the same priority
> Given that most of the processes were unrelated other than having the 
> same priority I think the evidence is pretty clear.
> 
> The question is now: How can we fix it? My proposal would be to simply 
> change primitiveSuspend such that for a non-active process it will 
> primitively take the process off its suspendingList. This makes suspend 
> a little more general and (by returning the previous suspendingList) it 
> will also guard us against any following cleanup (like the Semaphore 
> situations earlier).
> 
> Unfortunately, this *will* require VM changes but I don't think it can 
> be helped at this point since the VM will be manipulating these lists 
> atomically anyway. The good news though is that we can have reasonable 
> fallback code which does just exactly what we do today as a fallback to 
> primitiveSuspend.
> 
> Any comments? Alternatives? Suggestions?
> 
> 
> Cheers,
>   - Andreas
> 
> 

-------------- next part --------------
'From Croquet1.0beta of 11 April 2006 [latest update: #1] on 4 December 2007 at 4:32:26 pm'!

!Interpreter methodsFor: 'processes' stamp: 'ar 12/4/2007 16:21'!
removeProcess: aProcess fromList: aList 
	"Remove a given process from a linked list. May fail if aProcess is not on the list."
	| firstLink lastLink nextLink tempLink |
	firstLink := self fetchPointer: FirstLinkIndex ofObject: aList.
	lastLink := self fetchPointer: LastLinkIndex ofObject: aList.
	aProcess  == firstLink ifTrue:[
		nextLink := self fetchPointer: NextLinkIndex ofObject: aProcess .
		self storePointer: FirstLinkIndex ofObject: aList withValue: nextLink.
		aProcess  == lastLink ifTrue:[
			self storePointer: LastLinkIndex ofObject: aList withValue: self nilObject.
		].
	] ifFalse:[
		tempLink := firstLink.
		[tempLink == self nilObject ifTrue:[^self success: false]. "fail"
		nextLink := self fetchPointer: NextLinkIndex ofObject: tempLink.
		nextLink == aProcess] whileFalse:[
			tempLink := self fetchPointer: NextLinkIndex ofObject: tempLink.
		].
		nextLink := self fetchPointer: NextLinkIndex ofObject: aProcess.
		self storePointer: NextLinkIndex ofObject: tempLink withValue: nextLink.
		aProcess  == lastLink ifTrue:[
			self storePointer: LastLinkIndex ofObject: aList withValue: tempLink.
		].
	].
	self storePointer: NextLinkIndex ofObject: aProcess withValue: self nilObject.
! !

!Interpreter methodsFor: 'process primitives' stamp: 'ar 12/4/2007 16:31'!
primitiveSuspend
	"Primitive. Suspend the receiver, aProcess such that it can be executed again
	by sending #resume. If the given process is not currently running, take it off
	its corresponding list. The primitive returns the list the receiver was previously on."
	| process activeProc myList |
	process := self stackTop.
	activeProc := self fetchPointer: ActiveProcessIndex
						 ofObject: self schedulerPointer.
	process == activeProc ifTrue:[
		self pop: 1.
		self push: nilObj.
		self transferTo: self wakeHighestPriority.
	] ifFalse:[
		myList := self fetchPointer: MyListIndex ofObject: process.
		"XXXX Fixme. We should really check whether myList is a kind of LinkedList or not
		but we can't easily so just do a quick check for nil which is the most common case."
		myList == self nilObject ifTrue:[^self primitiveFail].
		self removeProcess: process fromList: myList.
		successFlag ifTrue:[
			self storePointer: MyListIndex ofObject: process withValue: self nilObject.
			self pop: 1.
			self push: myList.
		].
	].! !

-------------- next part --------------
'From Croquet1.0beta of 11 April 2006 [latest update: #1] on 4 December 2007 at 4:33:43 pm'!

!ControlManager methodsFor: 'scheduling' stamp: 'ar 12/4/2007 15:44'!
interruptName: labelString
	"Create a Notifier on the active scheduling process with the given label. Make the Notifier the active controller."
	| newActiveController |
	activeControllerProcess suspend.

	activeController ~~ nil ifTrue: [
		"Carefully de-emphasis the current window."
		activeController view topView deEmphasizeForDebugger].

	newActiveController :=
		(ToolSet
			interrupt: activeControllerProcess
			label: labelString) controller.
	newActiveController centerCursorInView.
	self activeController: newActiveController.
! !


!Process methodsFor: 'changing process state' stamp: 'ar 12/4/2007 15:49'!
primitiveSuspend
	"Primitive. Stop the process that self represents in such a way 
	that it can be restarted at a later time (by sending #resume).
	If the receiver is not the active process it will take it off its suspendingList.
	Essential. See Object documentation whatIsAPrimitive."
	| oldList |
	<primitive: 88>
	"This is fallback code for VMs which only support the old primitiveSuspend which 
	would not accept processes that are waiting to be run."
	myList ifNil:[self error: 'Process already suspended'].
	oldList := myList.
	myList := nil.
	oldList remove: self ifAbsent:[].
	^oldList! !

!Process methodsFor: 'changing process state' stamp: 'ar 12/4/2007 15:30'!
suspend
	"Stop the process that the receiver represents in such a way 
	that it can be restarted at a later time (by sending the receiver the 
	message resume). If the receiver represents the activeProcess, suspend it. 
	Otherwise remove the receiver from the list of waiting processes."
	self primitiveSuspend.
! !

!Process methodsFor: 'changing process state' stamp: 'ar 12/4/2007 15:55'!
terminate 
	"Stop the process that the receiver represents forever.  Unwind to execute pending ensure:/ifCurtailed: blocks before terminating."

	| ctxt unwindBlock oldList |
	self isActiveProcess ifTrue: [
		ctxt := thisContext.
		[	ctxt := ctxt findNextUnwindContextUpTo: nil.
			ctxt isNil
		] whileFalse: [
			unwindBlock := ctxt tempAt: 1.
			unwindBlock ifNotNil: [
				ctxt tempAt: 1 put: nil.
				thisContext terminateTo: ctxt.
				unwindBlock value].
		].
		thisContext terminateTo: nil.
		self primitiveSuspend.
	] ifFalse: [
		self isSuspended ifFalse:[oldList := self primitiveSuspend].
		suspendedContext ifNotNil: [
			"Figure out if we are terminating the process while waiting in Semaphore>>critical:
			In this case, pop the suspendedContext so that we leave the ensure: block inside
			Semaphore>>critical: without signaling the semaphore."
			(oldList class == Semaphore and:[
				suspendedContext method == (Semaphore compiledMethodAt: #critical:)]) ifTrue:[
					suspendedContext := suspendedContext home.
			].
			ctxt := self popTo: suspendedContext bottomContext.
			ctxt == suspendedContext bottomContext ifFalse: [
				self debug: ctxt title: 'Unwind error during termination']].
	].
! !


!Project class methodsFor: 'utilities' stamp: 'ar 12/4/2007 15:47'!
interruptName: labelString
	"Create a Notifier on the active scheduling process with the given label."
	| projectProcess |
	Smalltalk isMorphic ifFalse:
		[^ ScheduledControllers interruptName: labelString].
	ActiveHand ifNotNil:[ActiveHand interrupted].
	ActiveWorld := World. "reinstall active globals"
	ActiveHand := World primaryHand.
	ActiveHand interrupted. "make sure this one's interrupted too"
	ActiveEvent := nil.

	projectProcess := self uiProcess.	"we still need the accessor for a while"
	projectProcess suspend.
	ToolSet interrupt: projectProcess label: labelString! !

Process removeSelector: #offList!


More information about the Squeak-dev mailing list