My image just locked after it tried to unwind normally from a network timeout. I had to kill it from the OS.
Eliot would you guide me how to get my trunk image rolled back to a state without this fix so I can try to get back to where I was to see if its related?
Should I just load Kernel-eem.1184, an earlier one, or something else?
On Fri, Jul 27, 2018 at 1:08 PM, commits@source.squeak.org wrote:
A new version of Kernel was added to project The Inbox: http://source.squeak.org/inbox/Kernel-eem.1185.mcz
==================== Summary ====================
Name: Kernel-eem.1185 Author: eem Time: 27 July 2018, 11:08:12.641836 am UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70 Ancestors: Kernel-eem.1184
In releaseCriticalSection: use isUnwindContext instead of the more fragile context selector == #ensure:. Now it will handle ifCurtailed: too. Thanks Tobias!
Reformat Process>>#terminate. It was baaad.
=============== Diff against Kernel-eem.1184 ===============
Item was changed: ----- Method: Process>>releaseCriticalSection: (in category 'private') ----- releaseCriticalSection: runnable "Figure out if we are terminating a process that is in the ensure: block of a critical section. In this case, if the block has made progress, pop the suspendedContext so that we leave the ensure: block inside the critical: without signaling the semaphore/exiting the primitive section, since presumably this has already happened. But if it hasn't made progress but is beyond the
wait (which we can tell by the oldList being one of the runnable lists, i.e. a LinkedList, not a
wait (which we can tell my the oldList being one of the runnable lists, i.e. a LinkedList, not a Semaphore or Mutex, et al), then the ensure: block needs to be run." | selectorJustSent | (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self]. selectorJustSent := suspendedContext selectorJustSentOrSelf. "Receiver and/or argument blocks of ensure: in Semaphore>>critical: or Mutex>>#critical:" suspendedContext isClosureContext ifTrue:
[suspendedContext sender isUnwindContext ifTrue:
[suspendedContext sender selector == #ensure: ifTrue: [| notWaitingButMadeNoProgress | "Avoid running the ensure: block twice, popping it if it has already been run. If runnable but at the wait, leave it in place. N.B. No need to check if the block receiver of ensure: has not started to run (via suspendedContext pc = suspendedContext startpc) because ensure: uses valueNoContextSwitch, and so there is no suspension point before the wait." notWaitingButMadeNoProgress := runnable and: [selectorJustSent == #wait and: [suspendedContext sender selectorJustSentOrSelf == #valueNoContextSwitch]]. notWaitingButMadeNoProgress ifFalse: [suspendedContext := suspendedContext home]]. ^self]. "Either Semaphore>>critical: or Mutex>>#critical:. Is the process still blocked? If so, nothing further to do." runnable ifFalse: [^self]. "If still at the wait the ensure: block has not been activated, so signal to restore." selectorJustSent == #wait ifTrue: [suspendedContext receiver signal]. "If still at the lock primitive and the lock primitive just acquired ownership (indicated by it answering false) then the ensure block has not been activated, so explicitly primitiveExitCriticalSection to unlock." (selectorJustSent == #primitiveEnterCriticalSection or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue: [(suspendedContext stackPtr > 0 and: [suspendedContext top == false]) ifTrue: [suspendedContext receiver primitiveExitCriticalSection]]!
Item was changed: ----- Method: Process>>terminate (in category 'changing process state') ----- terminate "Stop the process that the receiver represents forever. Unwind to execute pending ensure:/ifCurtailed: blocks before terminating. If the process is in the middle of a critical: critical section, release it properly."
| ctxt unwindBlock oldList |
self isActiveProcess ifTrue:
[ctxt := thisContext.
[ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt ~~ nil] whileTrue:
[(ctxt tempAt: 2) ifNil:
["N.B. Unlike Context>>unwindTo: we do not set complete (tempAt: 2) to true."
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value]].
self isActiveProcess ifTrue: [
ctxt := thisContext.
[ ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt isNil
] whileFalse: [
(ctxt tempAt: 2) ifNil:[
ctxt tempAt: 2 put: nil.
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value].
]. thisContext terminateTo: nil. self suspend.
"If the process is resumed this will provoke a cannotReturn: error.
Would self debug: thisContext title: 'Resuming a terminated process' be better?"
^self].
] ifFalse:[
"Always suspend the process first so it doesn't accidentally get woken up.
N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection> pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
"Always suspend the process first so it doesn't accidentally get woken up.
N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection> pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
"If terminating a process halfways through an unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block currently under evaluation"
suspendedContext runUntilErrorOrReturnFrom: inner]].
"If terminating a process halfways through an unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block currently under evaluation"
suspendedContext runUntilErrorOrReturnFrom: inner]].
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during termination'].
"Set the context to its endPC for the benefit of isTerminated."
ctxt pc: ctxt endPC]!
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during termination'].
"Set the context to its endPC for the benefit of isTerminated."
ctxt pc: ctxt endPC]]!
Hi Chris,
On Wed, Aug 1, 2018 at 6:55 PM, Chris Muller asqueaker@gmail.com wrote:
My image just locked after it tried to unwind normally from a network timeout. I had to kill it from the OS.
Eliot would you guide me how to get my trunk image rolled back to a state without this fix so I can try to get back to where I was to see if its related?
Should I just load Kernel-eem.1184, an earlier one, or something else?
1182 is the last one before any of the critical: termination changes. 1183 has the critical: termination changes but doesn't change Semaphore>>critical: to the simple version. 1184 has the simpler Semaphore>>critical:
I would like to have a look at this image. My changes look good to me and others. One possible issue is that the send of releaseCriticalSection: in Process>>terminate considers a double attempt at terminating a process (for example sending terminate within an unwind block that is being run during terminate) an attempt to terminate a live process. That could be a mistake, and instead the send should only consider the process live if suspend answers a LinkedList. So you could experiment with changing the line
self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
to self releaseCriticalSection: oldList class == LinkedList.
But if it is possible to reproduce this lookup then I want to have a look and try and see what is going on.
On Fri, Jul 27, 2018 at 1:08 PM, commits@source.squeak.org wrote:
A new version of Kernel was added to project The Inbox: http://source.squeak.org/inbox/Kernel-eem.1185.mcz
==================== Summary ====================
Name: Kernel-eem.1185 Author: eem Time: 27 July 2018, 11:08:12.641836 am UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70 Ancestors: Kernel-eem.1184
In releaseCriticalSection: use isUnwindContext instead of the more
fragile context selector == #ensure:. Now it will handle ifCurtailed: too. Thanks Tobias!
Reformat Process>>#terminate. It was baaad.
=============== Diff against Kernel-eem.1184 ===============
Item was changed: ----- Method: Process>>releaseCriticalSection: (in category
'private') -----
releaseCriticalSection: runnable "Figure out if we are terminating a process that is in the
ensure: block of a critical section.
In this case, if the block has made progress, pop the
suspendedContext so that we leave the
ensure: block inside the critical: without signaling the
semaphore/exiting the primitive section,
since presumably this has already happened. But if it hasn't
made progress but is beyond the
wait (which we can tell by the oldList being one of the
runnable lists, i.e. a LinkedList, not a
wait (which we can tell my the oldList being one of the
runnable lists, i.e. a LinkedList, not a
Semaphore or Mutex, et al), then the ensure: block needs to be
run."
| selectorJustSent | (suspendedContext method pragmaAt: #criticalSection) ifNil:
[^self].
selectorJustSent := suspendedContext selectorJustSentOrSelf. "Receiver and/or argument blocks of ensure: in
Semaphore>>critical: or Mutex>>#critical:"
suspendedContext isClosureContext ifTrue:
[suspendedContext sender isUnwindContext ifTrue:
[suspendedContext sender selector == #ensure: ifTrue: [| notWaitingButMadeNoProgress | "Avoid running the ensure: block twice, popping
it if it has already been run. If runnable
but at the wait, leave it in place. N.B. No
need to check if the block receiver of ensure: has
not started to run (via suspendedContext pc =
suspendedContext startpc) because ensure:
uses valueNoContextSwitch, and so there is no
suspension point before the wait."
notWaitingButMadeNoProgress := runnable and: [selectorJustSent == #wait and: [suspendedContext sender
selectorJustSentOrSelf == #valueNoContextSwitch]].
notWaitingButMadeNoProgress ifFalse: [suspendedContext := suspendedContext
home]].
^self]. "Either Semaphore>>critical: or Mutex>>#critical:. Is the
process still blocked? If so, nothing further to do."
runnable ifFalse: [^self]. "If still at the wait the ensure: block has not been activated,
so signal to restore."
selectorJustSent == #wait ifTrue: [suspendedContext receiver signal]. "If still at the lock primitive and the lock primitive just
acquired ownership (indicated by it answering false)
then the ensure block has not been activated, so explicitly
primitiveExitCriticalSection to unlock."
(selectorJustSent == #primitiveEnterCriticalSection or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection])
ifTrue:
[(suspendedContext stackPtr > 0 and: [suspendedContext top == false]) ifTrue: [suspendedContext receiver
primitiveExitCriticalSection]]!
Item was changed: ----- Method: Process>>terminate (in category 'changing process
state') -----
terminate "Stop the process that the receiver represents forever. Unwind to execute pending ensure:/ifCurtailed: blocks before
terminating.
If the process is in the middle of a critical: critical
section, release it properly."
| ctxt unwindBlock oldList |
self isActiveProcess ifTrue:
[ctxt := thisContext.
[ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt ~~ nil] whileTrue:
[(ctxt tempAt: 2) ifNil:
["N.B. Unlike Context>>unwindTo: we do
not set complete (tempAt: 2) to true."
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value]].
self isActiveProcess ifTrue: [
ctxt := thisContext.
[ ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt isNil
] whileFalse: [
(ctxt tempAt: 2) ifNil:[
ctxt tempAt: 2 put: nil.
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value].
]. thisContext terminateTo: nil. self suspend.
"If the process is resumed this will provoke a
cannotReturn: error.
Would self debug: thisContext title: 'Resuming a
terminated process' be better?"
^self].
] ifFalse:[
"Always suspend the process first so it doesn't
accidentally get woken up.
N.B. If oldList is a LinkedList then the process is
runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the
process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the
<criticalSection> pragma.
The argument is whether the process is
runnable."
self releaseCriticalSection: (oldList isNil or:
[oldList class == LinkedList]).
"Always suspend the process first so it doesn't accidentally get
woken up.
N.B. If oldList is a LinkedList then the process is runnable.
If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process
is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection>
pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or:
[oldList class == LinkedList]).
"If terminating a process halfways through an
unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo:
nil) ifNotNil:
[:outer|
(suspendedContext
findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind
block currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
"If terminating a process halfways through an unwind,
try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil)
ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c
closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block
currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during
termination'].
"Set the context to its endPC for the benefit of
isTerminated."
ctxt pc: ctxt endPC]!
ctxt := self popTo: suspendedContext
bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error
during termination'].
"Set the context to its endPC for the benefit of
isTerminated."
ctxt pc: ctxt endPC]]!
Hi Chris,
Did you try to send SIGUSR1 to this image before killing it? (assuming it didn't happen on Windows)
Levente
On Wed, 1 Aug 2018, Chris Muller wrote:
My image just locked after it tried to unwind normally from a network timeout. I had to kill it from the OS.
Eliot would you guide me how to get my trunk image rolled back to a state without this fix so I can try to get back to where I was to see if its related?
Should I just load Kernel-eem.1184, an earlier one, or something else?
On Fri, Jul 27, 2018 at 1:08 PM, commits@source.squeak.org wrote:
A new version of Kernel was added to project The Inbox: http://source.squeak.org/inbox/Kernel-eem.1185.mcz
==================== Summary ====================
Name: Kernel-eem.1185 Author: eem Time: 27 July 2018, 11:08:12.641836 am UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70 Ancestors: Kernel-eem.1184
In releaseCriticalSection: use isUnwindContext instead of the more fragile context selector == #ensure:. Now it will handle ifCurtailed: too. Thanks Tobias!
Reformat Process>>#terminate. It was baaad.
=============== Diff against Kernel-eem.1184 ===============
Item was changed: ----- Method: Process>>releaseCriticalSection: (in category 'private') ----- releaseCriticalSection: runnable "Figure out if we are terminating a process that is in the ensure: block of a critical section. In this case, if the block has made progress, pop the suspendedContext so that we leave the ensure: block inside the critical: without signaling the semaphore/exiting the primitive section, since presumably this has already happened. But if it hasn't made progress but is beyond the
wait (which we can tell by the oldList being one of the runnable lists, i.e. a LinkedList, not a
wait (which we can tell my the oldList being one of the runnable lists, i.e. a LinkedList, not a Semaphore or Mutex, et al), then the ensure: block needs to be run." | selectorJustSent | (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self]. selectorJustSent := suspendedContext selectorJustSentOrSelf. "Receiver and/or argument blocks of ensure: in Semaphore>>critical: or Mutex>>#critical:" suspendedContext isClosureContext ifTrue:
[suspendedContext sender isUnwindContext ifTrue:
[suspendedContext sender selector == #ensure: ifTrue: [| notWaitingButMadeNoProgress | "Avoid running the ensure: block twice, popping it if it has already been run. If runnable but at the wait, leave it in place. N.B. No need to check if the block receiver of ensure: has not started to run (via suspendedContext pc = suspendedContext startpc) because ensure: uses valueNoContextSwitch, and so there is no suspension point before the wait." notWaitingButMadeNoProgress := runnable and: [selectorJustSent == #wait and: [suspendedContext sender selectorJustSentOrSelf == #valueNoContextSwitch]]. notWaitingButMadeNoProgress ifFalse: [suspendedContext := suspendedContext home]]. ^self]. "Either Semaphore>>critical: or Mutex>>#critical:. Is the process still blocked? If so, nothing further to do." runnable ifFalse: [^self]. "If still at the wait the ensure: block has not been activated, so signal to restore." selectorJustSent == #wait ifTrue: [suspendedContext receiver signal]. "If still at the lock primitive and the lock primitive just acquired ownership (indicated by it answering false) then the ensure block has not been activated, so explicitly primitiveExitCriticalSection to unlock." (selectorJustSent == #primitiveEnterCriticalSection or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue: [(suspendedContext stackPtr > 0 and: [suspendedContext top == false]) ifTrue: [suspendedContext receiver primitiveExitCriticalSection]]!
Item was changed: ----- Method: Process>>terminate (in category 'changing process state') ----- terminate "Stop the process that the receiver represents forever. Unwind to execute pending ensure:/ifCurtailed: blocks before terminating. If the process is in the middle of a critical: critical section, release it properly."
| ctxt unwindBlock oldList |
self isActiveProcess ifTrue:
[ctxt := thisContext.
[ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt ~~ nil] whileTrue:
[(ctxt tempAt: 2) ifNil:
["N.B. Unlike Context>>unwindTo: we do not set complete (tempAt: 2) to true."
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value]].
self isActiveProcess ifTrue: [
ctxt := thisContext.
[ ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt isNil
] whileFalse: [
(ctxt tempAt: 2) ifNil:[
ctxt tempAt: 2 put: nil.
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value].
]. thisContext terminateTo: nil. self suspend.
"If the process is resumed this will provoke a cannotReturn: error.
Would self debug: thisContext title: 'Resuming a terminated process' be better?"
^self].
] ifFalse:[
"Always suspend the process first so it doesn't accidentally get woken up.
N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection> pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
"Always suspend the process first so it doesn't accidentally get woken up.
N.B. If oldList is a LinkedList then the process is runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection> pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or: [oldList class == LinkedList]).
"If terminating a process halfways through an unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block currently under evaluation"
suspendedContext runUntilErrorOrReturnFrom: inner]].
"If terminating a process halfways through an unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block currently under evaluation"
suspendedContext runUntilErrorOrReturnFrom: inner]].
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during termination'].
"Set the context to its endPC for the benefit of isTerminated."
ctxt pc: ctxt endPC]!
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during termination'].
"Set the context to its endPC for the benefit of isTerminated."
ctxt pc: ctxt endPC]]!
Hi guys,
Thanks, Eliot, and please accept my apologies -- when I later rebooted the image it still only had 1182, and so I'm not 100% certain I had updated that image or simply hadn't saved after I updated. I am going to update and save everything and keep moving forward normally to see it happens again. The 201807260206 was flawless for me in the first big stress test, this time I'll be sure to save my updated trunk from the get go.
Thanks for that reminder about kill SIGUSR1, Levente. I don't use post-it notes but that's so useful I probably should get one for that.. :)
Best, Chris
On Thu, Aug 2, 2018 at 6:52 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Chris,
Did you try to send SIGUSR1 to this image before killing it? (assuming it didn't happen on Windows)
Levente
On Wed, 1 Aug 2018, Chris Muller wrote:
My image just locked after it tried to unwind normally from a network timeout. I had to kill it from the OS.
Eliot would you guide me how to get my trunk image rolled back to a state without this fix so I can try to get back to where I was to see if its related?
Should I just load Kernel-eem.1184, an earlier one, or something else?
On Fri, Jul 27, 2018 at 1:08 PM, commits@source.squeak.org wrote:
A new version of Kernel was added to project The Inbox: http://source.squeak.org/inbox/Kernel-eem.1185.mcz
==================== Summary ====================
Name: Kernel-eem.1185 Author: eem Time: 27 July 2018, 11:08:12.641836 am UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70 Ancestors: Kernel-eem.1184
In releaseCriticalSection: use isUnwindContext instead of the more fragile context selector == #ensure:. Now it will handle ifCurtailed: too. Thanks Tobias!
Reformat Process>>#terminate. It was baaad.
=============== Diff against Kernel-eem.1184 ===============
Item was changed:
----- Method: Process>>releaseCriticalSection: (in category 'private')
releaseCriticalSection: runnable "Figure out if we are terminating a process that is in the ensure: block of a critical section. In this case, if the block has made progress, pop the suspendedContext so that we leave the ensure: block inside the critical: without signaling the semaphore/exiting the primitive section, since presumably this has already happened. But if it hasn't made progress but is beyond the
wait (which we can tell by the oldList being one of the runnable
lists, i.e. a LinkedList, not a
wait (which we can tell my the oldList being one of the runnable
lists, i.e. a LinkedList, not a Semaphore or Mutex, et al), then the ensure: block needs to be run." | selectorJustSent | (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self]. selectorJustSent := suspendedContext selectorJustSentOrSelf.
"Receiver and/or argument blocks of ensure: in
Semaphore>>critical: or Mutex>>#critical:" suspendedContext isClosureContext ifTrue:
[suspendedContext sender isUnwindContext ifTrue:
[suspendedContext sender selector == #ensure: ifTrue: [| notWaitingButMadeNoProgress | "Avoid running the ensure: block twice, popping
it if it has already been run. If runnable but at the wait, leave it in place. N.B. No need to check if the block receiver of ensure: has not started to run (via suspendedContext pc = suspendedContext startpc) because ensure: uses valueNoContextSwitch, and so there is no suspension point before the wait." notWaitingButMadeNoProgress := runnable and: [selectorJustSent == #wait and: [suspendedContext sender selectorJustSentOrSelf == #valueNoContextSwitch]]. notWaitingButMadeNoProgress ifFalse: [suspendedContext := suspendedContext home]]. ^self].
"Either Semaphore>>critical: or Mutex>>#critical:. Is the
process still blocked? If so, nothing further to do." runnable ifFalse: [^self].
"If still at the wait the ensure: block has not been activated,
so signal to restore." selectorJustSent == #wait ifTrue: [suspendedContext receiver signal].
"If still at the lock primitive and the lock primitive just
acquired ownership (indicated by it answering false) then the ensure block has not been activated, so explicitly primitiveExitCriticalSection to unlock." (selectorJustSent == #primitiveEnterCriticalSection or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue: [(suspendedContext stackPtr > 0 and: [suspendedContext top == false]) ifTrue: [suspendedContext receiver primitiveExitCriticalSection]]!
Item was changed:
----- Method: Process>>terminate (in category 'changing process state')
terminate "Stop the process that the receiver represents forever. Unwind to execute pending ensure:/ifCurtailed: blocks before terminating. If the process is in the middle of a critical: critical section, release it properly."
| ctxt unwindBlock oldList |
self isActiveProcess ifTrue:
[ctxt := thisContext.
[ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt ~~ nil] whileTrue:
[(ctxt tempAt: 2) ifNil:
["N.B. Unlike Context>>unwindTo: we do
not set complete (tempAt: 2) to true."
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value]].
self isActiveProcess ifTrue: [
ctxt := thisContext.
[ ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt isNil
] whileFalse: [
(ctxt tempAt: 2) ifNil:[
ctxt tempAt: 2 put: nil.
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value].
]. thisContext terminateTo: nil. self suspend.
"If the process is resumed this will provoke a
cannotReturn: error.
Would self debug: thisContext title: 'Resuming a
terminated process' be better?"
^self].
] ifFalse:[
"Always suspend the process first so it doesn't
accidentally get woken up.
N.B. If oldList is a LinkedList then the process is
runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the
process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the
<criticalSection> pragma.
The argument is whether the process is
runnable."
self releaseCriticalSection: (oldList isNil or:
[oldList class == LinkedList]).
"Always suspend the process first so it doesn't accidentally get
woken up.
N.B. If oldList is a LinkedList then the process is runnable. If
it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process
is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection>
pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or: [oldList
class == LinkedList]).
"If terminating a process halfways through an
unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil)
ifNotNil:
[:outer|
(suspendedContext
findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block
currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
"If terminating a process halfways through an unwind, try
to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil)
ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c
closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block
currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during
termination'].
"Set the context to its endPC for the benefit of
isTerminated."
ctxt pc: ctxt endPC]!
ctxt := self popTo: suspendedContext
bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error
during termination'].
"Set the context to its endPC for the benefit of
isTerminated."
ctxt pc: ctxt endPC]]!
Hi Chris,
On Thu, Aug 2, 2018 at 4:03 PM, Chris Muller asqueaker@gmail.com wrote:
Hi guys,
Thanks, Eliot, and please accept my apologies -- when I later rebooted the image it still only had 1182, and so I'm not 100% certain I had updated that image or simply hadn't saved after I updated.
If you look at the changes file you should be able to see whether the changes from 1183, 1184 et al are present on the changes file or not. The package name is logged on the changes file as a comment doit, e.g. "Kernel"!
If people think it's a good idea we could also log update beginnings and endings. It would be nice to see "========== Update completed: 18150 -> 18163 =========="! logged on the changes file.
I am going to update and save everything and keep moving forward normally to see it happens again. The 201807260206 was flawless for me in the first big stress test, this time I'll be sure to save my updated trunk from the get go.
Cool. Thanks. And if there is a lock up, I'd love to look at the image with you.
Thanks for that reminder about kill SIGUSR1, Levente. I don't use post-it notes but that's so useful I probably should get one for that.. :)
Best, Chris
On Thu, Aug 2, 2018 at 6:52 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Chris,
Did you try to send SIGUSR1 to this image before killing it? (assuming it didn't happen on Windows)
Levente
On Wed, 1 Aug 2018, Chris Muller wrote:
My image just locked after it tried to unwind normally from a network timeout. I had to kill it from the OS.
Eliot would you guide me how to get my trunk image rolled back to a state without this fix so I can try to get back to where I was to see if its related?
Should I just load Kernel-eem.1184, an earlier one, or something else?
On Fri, Jul 27, 2018 at 1:08 PM, commits@source.squeak.org wrote:
A new version of Kernel was added to project The Inbox: http://source.squeak.org/inbox/Kernel-eem.1185.mcz
==================== Summary ====================
Name: Kernel-eem.1185 Author: eem Time: 27 July 2018, 11:08:12.641836 am UUID: 7296ad00-708d-4cef-b6bc-ceb24d897a70 Ancestors: Kernel-eem.1184
In releaseCriticalSection: use isUnwindContext instead of the more fragile context selector == #ensure:. Now it will handle ifCurtailed:
too.
Thanks Tobias!
Reformat Process>>#terminate. It was baaad.
=============== Diff against Kernel-eem.1184 ===============
Item was changed: ----- Method: Process>>releaseCriticalSection: (in category
'private')
releaseCriticalSection: runnable "Figure out if we are terminating a process that is in the ensure: block of a critical section. In this case, if the block has made progress, pop the suspendedContext so that we leave the ensure: block inside the critical: without signaling the semaphore/exiting the primitive section, since presumably this has already happened. But if it hasn't made progress but is beyond the
wait (which we can tell by the oldList being one of the
runnable
lists, i.e. a LinkedList, not a
wait (which we can tell my the oldList being one of the
runnable
lists, i.e. a LinkedList, not a Semaphore or Mutex, et al), then the ensure: block needs to be run." | selectorJustSent | (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self]. selectorJustSent := suspendedContext selectorJustSentOrSelf.
"Receiver and/or argument blocks of ensure: in
Semaphore>>critical: or Mutex>>#critical:" suspendedContext isClosureContext ifTrue:
[suspendedContext sender isUnwindContext ifTrue:
[suspendedContext sender selector == #ensure: ifTrue: [| notWaitingButMadeNoProgress | "Avoid running the ensure: block twice, popping
it if it has already been run. If runnable but at the wait, leave it in place. N.B. No
need
to check if the block receiver of ensure: has not started to run (via suspendedContext pc = suspendedContext startpc) because ensure: uses valueNoContextSwitch, and so there is no suspension point before the wait." notWaitingButMadeNoProgress := runnable and: [selectorJustSent == #wait and: [suspendedContext sender selectorJustSentOrSelf == #valueNoContextSwitch]]. notWaitingButMadeNoProgress ifFalse: [suspendedContext := suspendedContext home]]. ^self].
"Either Semaphore>>critical: or Mutex>>#critical:. Is the
process still blocked? If so, nothing further to do." runnable ifFalse: [^self].
"If still at the wait the ensure: block has not been activated,
so signal to restore." selectorJustSent == #wait ifTrue: [suspendedContext receiver signal].
"If still at the lock primitive and the lock primitive just
acquired ownership (indicated by it answering false) then the ensure block has not been activated, so explicitly primitiveExitCriticalSection to unlock." (selectorJustSent == #primitiveEnterCriticalSection or: [selectorJustSent == #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue: [(suspendedContext stackPtr > 0 and: [suspendedContext top == false]) ifTrue: [suspendedContext receiver primitiveExitCriticalSection]]!
Item was changed: ----- Method: Process>>terminate (in category 'changing process
state')
terminate "Stop the process that the receiver represents forever. Unwind to execute pending ensure:/ifCurtailed: blocks before terminating. If the process is in the middle of a critical: critical
section,
release it properly."
| ctxt unwindBlock oldList |
self isActiveProcess ifTrue:
[ctxt := thisContext.
[ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt ~~ nil] whileTrue:
[(ctxt tempAt: 2) ifNil:
["N.B. Unlike Context>>unwindTo: we do
not set complete (tempAt: 2) to true."
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value]].
self isActiveProcess ifTrue: [
ctxt := thisContext.
[ ctxt := ctxt findNextUnwindContextUpTo: nil.
ctxt isNil
] whileFalse: [
(ctxt tempAt: 2) ifNil:[
ctxt tempAt: 2 put: nil.
unwindBlock := ctxt tempAt: 1.
thisContext terminateTo: ctxt.
unwindBlock value].
]. thisContext terminateTo: nil. self suspend.
"If the process is resumed this will provoke a
cannotReturn: error.
Would self debug: thisContext title: 'Resuming a
terminated process' be better?"
^self].
] ifFalse:[
"Always suspend the process first so it doesn't
accidentally get woken up.
N.B. If oldList is a LinkedList then the process is
runnable. If it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the
process is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the
<criticalSection> pragma.
The argument is whether the process is
runnable."
self releaseCriticalSection: (oldList isNil
or:
[oldList class == LinkedList]).
"Always suspend the process first so it doesn't accidentally
get
woken up.
N.B. If oldList is a LinkedList then the process is runnable.
If
it is a Semaphore/Mutex et al
then the process is blocked, and if it is nil then the process
is already suspended."
oldList := self suspend.
suspendedContext ifNotNil:
["Release any method marked with the <criticalSection>
pragma.
The argument is whether the process is runnable."
self releaseCriticalSection: (oldList isNil or:
[oldList
class == LinkedList]).
"If terminating a process halfways through an
unwind, try to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo:
nil)
ifNotNil:
[:outer|
(suspendedContext
findContextSuchThat:[:c| c closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind
block
currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
"If terminating a process halfways through an unwind,
try
to complete that unwind block first."
(suspendedContext findNextUnwindContextUpTo: nil)
ifNotNil:
[:outer|
(suspendedContext findContextSuchThat:[:c| c
closure == (outer tempAt: 1)]) ifNotNil:
[:inner| "This is an unwind block
currently under evaluation"
suspendedContext
runUntilErrorOrReturnFrom: inner]].
ctxt := self popTo: suspendedContext bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error during
termination'].
"Set the context to its endPC for the benefit of
isTerminated."
ctxt pc: ctxt endPC]!
ctxt := self popTo: suspendedContext
bottomContext.
ctxt == suspendedContext bottomContext ifFalse:
[self debug: ctxt title: 'Unwind error
during termination'].
"Set the context to its endPC for the benefit
of
isTerminated."
ctxt pc: ctxt endPC]]!
If people think it's a good idea we could also log update beginnings and endings. It would be nice to see "========== Update completed: 18150 -> 18163 =========="! logged on the changes file.
+1
-C
-- Craig Latta Black Page Digital Amsterdam :: San Francisco craig@blackpagedigital.com +31 6 2757 7177 (SMS ok) + 1 415 287 3547 (no SMS)
squeak-dev@lists.squeakfoundation.org