[squeak-dev] Solving multiple termination bugs

Mon Apr 5 17:00:27 UTC 2021

Hi All,

here's a summary of bugs in the current termination procedure:

1. #isTerminated and #isSuspended fail to report correctly - in [1]

2. active process termination bug causing an image crash - in [2]

3. nested unwind bug: skipping some ensure blocks - in [3]

4. likely a bug: a failure to complete nested unwind blocks halfway thru
execution - also in [3]

5. Christoph's discovery: a failure to correctly close the debugger in case
of error or non-local return in unwind - in [4]

6. inconsistent semantics of unwinding protected blocks during active vs.
suspended process termination 
(not reported yet - different sets of unwind blocks are executed depending
on whether a process terminates itself or whether some other process
initiates its termination; the proposed solution unifies active and
suspended process termination and is consistent with Visual Works semantics
(and goes beyond))

[1]
http://forum.world.st/The-Inbox-Kernel-jar-1376-mcz-td5127335.html#a5127336
[2]
http://forum.world.st/A-bug-in-active-process-termination-crashing-image-td5128186.html
[3]
http://forum.world.st/Another-bug-in-Process-terminate-in-unwinding-contexts-tp5128171p5128178.html
[4]
http://forum.world.st/Bug-in-Process-gt-gt-terminate-Returning-from-unwind-contexts-td5127570.html#a5128095

I'd like to propose a solution - a rewrite of #terminate - addressing all of
the above issues. Main points of the solution are:

(i) why differentiate active and suspended process termination? I propose to
unify the two: to suspend the active process and terminate it as a suspended
process. As a result the semantics/outcome of the unwind algorithm will be
the same in both cases. Fixes 2 and 6.

(ii) the above leads to a susbstantial simplification of #isTerminated - a
process is terminated when at the last instruction of its bottom context.
Fixes 1.

(iii) unwind algorithm for incomplete unwind blocks is extended to execute
the outer-most instead of the inner-most incomplete unwind context. Fixes 3
and 4.

(iv) indirect termination via #popTo is replaced with a analogue of
#unwindTo: using Eliot's ingenious #runUntilErrorOrReturnFrom: the same way
as in (iii). I'd very much appreciate if someone commented this part because
I haven't found any clue why the indirect termination has implemented using
#popTo and not #runUntilErrorOrReturnFrom. Fixes 5.

I enclose a set of examples used to compare the current Squeak semantics
with the proposed one (and with VW) that can be used to build test cases. 

Here's a commented code. I'll be happy to provide detailed step-by-step
guidance if you find this whole idea interesting and worth implementing. I'm
convinced at least parts of the proposal should be integrated as simple
fixes of the current bugs. Thank you for your patience if you're still
reading :) Any feedback extremely welcome. A changeset is enclosed for your
convenience here:  Fix_terminate_v2.cs
<http://forum.world.st/file/t372955/Fix_terminate_v2.cs>  

Process >> terminate 
	"Stop the process that the receiver represents so that it answers true to
#isTerminated.
	 Unwind to execute pending and unfinished ensure:/ifCurtailed: blocks
before terminating.
	 If the process is in the middle of a critical: critical section, release
it properly."

	| ctxt unwindBlock oldList outerMost |
	self isActiveProcess ifTrue: [
		"If terminating the active process, suspend it first and terminate it as a
suspended process.
		Nested #terminate messages could derail the termination so let's enclose
it in #ensure."
		[[] ensure: [self terminate]] fork.
		^self suspend].

	"Always suspend the process first so it doesn't accidentally get woken up.
	 N.B. If oldList is a LinkedList then the process is runnable. If it is a
Semaphore/Mutex et al
	 then the process is blocked, and if it is nil then the process is already
suspended."
	oldList := self suspend.
	suspendedContext ifNotNil:
		["Release any method marked with the <criticalSection> pragma.
		  The argument is whether the process is runnable."
		 self releaseCriticalSection: (oldList isNil or: [oldList class ==
LinkedList]).

		"If terminating a process halfways through an unwind, try to complete that
unwind block first;
		if there are multiple such nested unwind blocks, try to complete the
outer-most one; the inner
		blocks will be completed in the process."
		ctxt := suspendedContext.
		[(ctxt := ctxt findNextUnwindContextUpTo: nil) isNil] whileFalse: 
			"Contexts under evaluation have already set their complete (tempAt: 2) to
true."
			[(ctxt tempAt:2) ifNotNil: [outerMost := ctxt]].
		outerMost ifNotNil: [
			"This is the outer-most unwind context currently under evaluation;
			let's find an inner context executing outerMost's argument block (tempAt:
1)"
			(suspendedContext findContextSuchThat: [:ctx | 
				ctx closure == (outerMost tempAt: 1)]) ifNotNil: [:inner | 
					"Let's finish the unfinished unwind context only (i.e. up to inner) and
return here"
					suspendedContext runUntilErrorOrReturnFrom: inner.
					"Update the receiver's suspendedContext (the previous step reset its
sender to nil)"
					suspendedContext := outerMost]]. 

		"Now all unwind blocks caught halfway through have been completed; 
		let's execute the ones still pending. Note: #findNextUnwindContextUpTo:
starts
		searching from the receiver's sender but the receiver itself may be an
unwind context."
		ctxt := suspendedContext.
		ctxt isUnwindContext ifFalse: [ctxt := ctxt findNextUnwindContextUpTo:
nil].
		[ctxt isNil] whileFalse: [
			(ctxt tempAt: 2) ifNil: [
				ctxt tempAt: 2 put: true.
				unwindBlock := ctxt tempAt: 1.
				"Create a context for the unwind block and execute it on the unwind
block's stack. 
				Note: using #value instead of #runUntilErrorOrReturnFrom: would lead to
executing 
				the unwind on the wrong stack preventing the correct execution of
non-local returns."
				suspendedContext := unwindBlock asContextWithSender: ctxt.
				suspendedContext runUntilErrorOrReturnFrom: suspendedContext].
			ctxt := ctxt findNextUnwindContextUpTo: nil].

		"Set the context to its endPC and its sender to nil for the benefit of
isTerminated."
		ctxt := suspendedContext.
		ctxt terminateTo: nil.
		ctxt pc: ctxt endPC]

Process >> isTerminated
	"Answer if the receiver is terminated. A process is considered terminated
	if the suspendedContext is the bottomContext and the pc is at the endPC"

	self isActiveProcess ifTrue: [^ false].
	^suspendedContext isNil or: [
		suspendedContext isBottomContext and: [
			suspendedContext isDead or: [suspendedContext atEnd]]]

Process >> isSuspended
	"A process is suspended if it has non-nil suspendedContext (e.g. new or 
	previously suspended with the suspend primitive) and is not terminated or
	waiting in a scheduler or a semaphore queue (i.e. is not runnable or
blocked)."

	^myList isNil
		and: [suspendedContext notNil
		and: [self isTerminated not]]

Debugger >> windowIsClosing
	"My window is being closed; clean up. Restart the low space watcher.
	If the debugged process is a do-it, it may do a non-local return escaping
	termination so we need to insert a #terminate context under the do-it 
	to make sure the debugged UI process will terminate."

	| doItContext terminateContext |
	(interruptedProcess == nil or: [interruptedProcess suspendedContext isNil])
ifTrue: [^ self].

	"find a do-it context; answer nil if it doesn't exist in the sender chain"
	doItContext := interruptedProcess suspendedContext 
		findContextSuchThat: [:ctx | 
			ctx methodClass = UndefinedObject and: [
			ctx selector = #DoIt and: [
			ctx closure isNil]]].
	doItContext ifNotNil: [ "it exists so let's insert a #terminate context"
		terminateContext := Context
			sender: doItContext sender
			receiver: interruptedProcess 
			method: (Process>>#terminate) 
			arguments: {}.
		doItContext privSender: terminateContext ].

	interruptedProcess terminate.
	interruptedProcess := nil.

	contextStack := nil.
	receiverInspector := nil.
	contextVariablesInspector := nil.
	Smalltalk installLowSpaceWatcher.  "restart low space handler"

A few examples to illustrate the idea (many more enclosed here 
Some_examples_to_examine_terminate_bugs.txt
<http://forum.world.st/file/t372955/Some_examples_to_examine_terminate_bugs.txt>  
):

Ex. 1:

| p |
"Suspend process inside ensure block and make sure x1 x2 and x3 are printed.
Currently only x1 is printed."
p := [
	[
		[ ] ensure: [
			[ ] ensure: [
				Processor activeProcess suspend. 
				Transcript show: 'x1']. 
			Transcript show: 'x2']
	] ensure: [
		Transcript show: 'x3']
] fork.
Processor yield.
p terminate

---> x1   (instead of x1 x2 x3)

Ex. 2:

| p |
Currently only x3 is printed."
p := [
	[
		[ ] ensure: [
			[ ] ensure: [
				Processor activeProcess terminate. 
				Transcript show: 'x1']. 
	"terminate active process inside ensure block and make sure x1 x2 and x3
are printed.
		Transcript show: 'x2']
	] ensure: [
		Transcript show: 'x3']
] fork.
Processor yield

---> x3   (instead of x1 x2 x3)

Ex. 3:

"unwind after error inside ensure block and make sure x1 x2 and x3 are
printed.
[
	[ ] ensure: [
		[ ] ensure: [
			self error. 
			Transcript show: 'x1']. 
		Transcript show: 'x2']
] ensure: [
	^Transcript show: 'x3']

---> x3   (instead of x1 x2 x3)

Note: nested errors during unwind follow #runUntilErrorOrReturnFrom: logic,
i.e. return exception and let user decide...

-----
^[^ Jaromir
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

[squeak-dev] Solving multiple termination bugs - summary & proposal