Hi Chris,
On Wed, Aug 1, 2018 at 6:55 PM, Chris Muller asqueaker@gmail.com wrote:
My image just locked after it tried to unwind normally from a network timeout. I had to kill it from the OS.
Eliot would you guide me how to get my trunk image rolled back to a state without this fix so I can try to get back to where I was to see if its related?
Should I just load Kernel-eem.1184, an earlier one, or something else?
1182 is the last one before any of the critical: termination changes. 1183 has the critical: termination changes but doesn't change Semaphore>>critical: to the simple version. 1184 has the simpler Semaphore>>critical:
I would like to have a look at this image. My changes look good to me and others. One possible issue is that the send of releaseCriticalSection: in Process>>terminate considers a double attempt at terminating a process (for example sending terminate within an unwind block that is being run during terminate) an attempt to terminate a live process. That could be a mistake, and instead the send should only consider the process live if suspend answers a LinkedList. So you could experiment with changing the line
self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
to self releaseCriticalSection: oldList class == LinkedList.
But if it is possible to reproduce this lookup then I want to have a look and try and see what is going on.
On Fri, Jul 27, 2018 at 1:08 PM, commits@source.squeak.org wrote:
A new version of Kernel was added to project The Inbox: http://source.squeak.org/inbox/Kernel-eem.1185.mcz
==================== Summary ====================
Name: Kernel-eem.1185 Author: eem Time: 27 July 2018, 11:08:12.641836 am UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70 Ancestors: Kernel-eem.1184
In releaseCriticalSection: use isUnwindContext instead of the more
fragile context selector == #ensure:. Now it will handle ifCurtailed: too. Thanks Tobias!
Reformat Process>>#terminate. It was baaad.
=============== Diff against Kernel-eem.1184 ===============
Item was changed: ----- Method: Process>>releaseCriticalSection: (in category
'private') -----
releaseCriticalSection: runnable "Figure out if we are terminating a process that is in the
ensure: block of a critical section.
In this case, if the block has made progress, pop the
suspendedContext so that we leave the
ensure: block inside the critical: without signaling the
semaphore/exiting the primitive section,
since presumably this has already happened. But if it hasn't
made progress but is beyond the
wait (which we can tell by the oldList being one of the
runnable lists, i.e. a LinkedList, not a
wait (which we can tell my the oldList being one of the
runnable lists, i.e. a LinkedList, not a
Semaphore or Mutex, et al), then the ensure: block needs to be
run."
| selectorJustSent | (suspendedContext method pragmaAt: #criticalSection) ifNil:
[^self].
selectorJustSent := suspendedContext selectorJustSentOrSelf. "Receiver and/or argument blocks of ensure: in
Semaphore>>critical: or Mutex>>#critical:"
suspendedContext isClosureContext ifTrue:
[suspendedContext sender isUnwindContext ifTrue:
[suspendedContext sender selector == #ensure: ifTrue: [| notWaitingButMadeNoProgress | "Avoid running the ensure: block twice, popping
it if it has already been run. If runnable
but at the wait, leave it in place. N.B. No
need to check if the block receiver of ensure: has
not started to run (via suspendedContext pc =
suspendedContext startpc) because ensure:
uses valueNoContextSwitch, and so there is no
suspension point before the wait."
notWaitingButMadeNoProgress := runnable and: [selectorJustSent == #wait and: [suspendedContext sender
selectorJustSentOrSelf == #valueNoContextSwitch]].
notWaitingButMadeNoProgress ifFalse: [suspendedContext := suspendedContext
home]].
^self]. "Either Semaphore>>critical: or Mutex>>#critical:. Is the
process still blocked? If so, nothing further to do."
runnable ifFalse: [^self]. "If still at the wait the ensure: block has not been activated,
so signal to restore."
selectorJustSent == #wait ifTrue: [suspendedContext receiver signal]. "If still at the lock primitive and the lock primitive just
acquired ownership (indicated by it answering false)
then the ensure block has not been activated, so explicitly
primitiveExitCriticalSection to unlock."
(selectorJustSent == #primitiveEnterCriticalSection or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection])
ifTrue:
[(suspendedContext stackPtr > 0 and: [suspendedContext top == false]) ifTrue: [suspendedContext receiver
primitiveExitCriticalSection]]!
Item was changed: ----- Method: Process>>terminate (in category 'changing process
state') -----
terminate "Stop the process that the receiver represents forever. Unwind to execute pending ensure:/ifCurtailed: blocks before
terminating.
If the process is in the middle of a critical: critical
section, release it properly."
| ctxt unwindBlock oldList |
self isActiveProcess ifTrue:
[ctxt := thisContext.
[ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt ~~ nil] whileTrue:
[(ctxt tempAt: 2) ifNil:
["N.B. Unlike Context>>unwindTo: we do
not set complete (tempAt: 2) to true."
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value]].
self isActiveProcess ifTrue: [
ctxt := thisContext.
[ ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt isNil
] whileFalse: [
(ctxt tempAt: 2) ifNil:[
ctxt tempAt: 2 put: nil.
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value].
]. thisContext terminateTo: nil. self suspend.
"If the process is resumed this will provoke a
cannotReturn: error.
Would self debug: thisContext title: 'Resuming a
terminated process' be better?"
^self].
] ifFalse:[
"Always suspend the process first so it doesn't
accidentally get woken up.
N.B. If oldList is a LinkedList then the process is
runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the
process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the
<criticalSection> pragma.
The argument is whether the process is
runnable."
self releaseCriticalSection: (oldList isNil or:
[oldList class == LinkedList]).
"Always suspend the process first so it doesn't accidentally get
woken up.
N.B. If oldList is a LinkedList then the process is runnable.
If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process
is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection>
pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or:
[oldList class == LinkedList]).
"If terminating a process halfways through an
unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo:
nil) ifNotNil:
[:outer|
(suspendedContext
findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind
block currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
"If terminating a process halfways through an unwind,
try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil)
ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c
closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block
currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during
termination'].
"Set the context to its endPC for the benefit of
isTerminated."
ctxt pc: ctxt endPC]!
ctxt := self popTo: suspendedContext
bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error
during termination'].
"Set the context to its endPC for the benefit of
isTerminated."
ctxt pc: ctxt endPC]]!