[Vm-dev] Delay machinery (was Re: [Pharo-dev] Suspending a Process)

Eliot Miranda eliot.miranda at gmail.com
Mon Jul 28 05:20:11 UTC 2014


On Sun, Jul 27, 2014 at 1:14 PM, Nicolai Hess <nicolaihess at web.de> wrote:

>
>
>
>
> 2014-07-25 20:05 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>
>>
>> Hi Nicolai, Hi Ben,
>>
>>
>> On Fri, Jul 25, 2014 at 10:55 AM, Nicolai Hess <nicolaihess at web.de>
>> wrote:
>>
>>>
>>>
>>> Hi Ben,
>>>
>>> I am on Windows too :(
>>> So, the fixes does not work (not always) on winddows too. But at least
>>> they make it less probable to occure, but it still happens.
>>> The most distracting thing is, after the first ui lock, pressing
>>> alt+dot, closing the debuggers, pressing alt+dot ....
>>> and trying to close the very first debugger, after that, it all works.
>>> The UI is responsive again and suspending the process does
>>> not block the ui anymore.
>>> It "looks like" supsending the process reactivates another process that
>>> blocks the UI. And as soon as I terminate this
>>> process (alt+dot, close debugger ...) all works.
>>> But I really don't know.
>>>
>>
>> if you can run a unix machine (in a VM?) then remember that kill -USR1
>> pid will cause the VM to print out a stack backtrace of all processes in
>> the image.  That can be very useful in debuggng lockups like this.
>>
>> HTH
>>
>
> Ok, but I don't know if this helps, at least it does not look very helpful
> to me:)
>

Then you're not reading it properly.  It clearly shows you have a deadlock:

Process 0xb9148c20 priority 40
0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure
0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay
0xff79082c M Delay>wait 0xb91501e4: a(n) Delay
0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n)
BackgroundWorkDisplayMorph
0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure

Process 0xb7902630 priority 40
0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure
0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay
0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay
0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState
0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState
0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph
0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager
0xb78cb554 s [] in BlockClosure()>?

SIGUSR1 Mon Jul 28 01:06:16 2014
>
>
> pharo VM version: 3.9-7 #1 Tue May  6 08:30:23 UTC 2014 gcc 4.8.2
> [Production ITHB VM]
> Built from: NBCoInterpreter NativeBoost-CogPlugin-GuillermoPolito.19 uuid:
> acc98e51-2fba-4841-a965-2975997bba66 May  6 2014
> With: NBCogit NativeBoost-CogPlugin-GuillermoPolito.19 uuid:
> acc98e51-2fba-4841-a965-2975997bba66 May  6 2014
> Revision: https://github.com/pharo-project/pharo-vm.git Commit:
> ef5832e6f70e5b24e8b9b1f4b8509a62b6c88040 Date: 2014-01-26 15:34:28 +0100
> By: Esteban Lorenzano <estebanlm at gmail.com> Jenkins build #14794
> Build host: Linux chindi08 2.6.24-32-xen #1 SMP Mon Dec 3 16:12:25 UTC
> 2012 i686 i686 i686 GNU/Linux
> plugin path: /usr/lib/pharo-vm/ [default: /usr/lib/pharo-vm/]
>
>
> C stack backtrace:
> /usr/lib/pharo-vm/pharo-vm[0x809ad23]
> /usr/lib/pharo-vm/pharo-vm[0x809af6e]
> [0xf7784410]
> [0xf7784425]
> /lib/i386-linux-gnu/libc.so.6(__select+0x2d)[0xf763691d]
> /usr/lib/pharo-vm/pharo-vm(aioPoll+0x13d)[0x809748d]
> /usr/lib/pharo-vm/vm-display-X11.so(+0xdc85)[0xf71d2c85]
>
> /usr/lib/pharo-vm/pharo-vm(ioRelinquishProcessorForMicroseconds+0x17)[0x8099b57]
> /usr/lib/pharo-vm/pharo-vm[0x8070685]
> [0xb6f8dbc3]
> [0xb6f89700]
> [0xb7a2650e]
> [0xb6f895c0]
>
>
> All Smalltalk process stacks (active first):
> Process 0xb88376fc priority 10
> 0xff76c830 M ProcessorScheduler class>idleProcess 0xb7306b08: a(n)
> ProcessorScheduler class
> 0xff76c850 I [] in ProcessorScheduler class>startUp 0xb7306b08: a(n)
> ProcessorScheduler class
> 0xff76c870 I [] in BlockClosure>newProcess 0xb8837620: a(n) BlockClosure
>
> Process 0xb8838c78 priority 50
> 0xff768830 I WeakArray class>finalizationProcess 0xb7306cd8: a(n)
> WeakArray class
> 0xff768850 I [] in WeakArray class>restartFinalizationProcess 0xb7306cd8:
> a(n) WeakArray class
> 0xff768870 I [] in BlockClosure>newProcess 0xb8838b9c: a(n) BlockClosure
>
> Process 0xb9148c20 priority 40
> 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
> 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure
> 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
> 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay
> 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay
> 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n)
> BackgroundWorkDisplayMorph
> 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure
>
> Process 0xb7902630 priority 40
> 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
> 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure
> 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore
> 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay
> 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay
> 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState
> 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState
> 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph
> 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager
> 0xb78cb554 s [] in BlockClosure()>?
>
> Process 0xb82f9078 priority 80
> 0xff765858 M Delay class>handleTimerEvent 0xb8684d08: a(n) Delay class
> 0xff765870 M Delay class()>? 0xb8684d08: a(n) Delay class
> 0xb8623474 s [] in Delay class()>?
> 0xb82f9018 s [] in BlockClosure>newProcess
>
> Process 0xb883735c priority 60
> 0xff76680c M InputEventFetcher>waitForInput 0xb72f059c: a(n)
> InputEventFetcher
> 0xff766830 M InputEventFetcher>eventLoop 0xb72f059c: a(n) InputEventFetcher
> 0xff766850 I [] in InputEventFetcher>installEventLoop 0xb72f059c: a(n)
> InputEventFetcher
> 0xff766870 I [] in BlockClosure>newProcess 0xb8837280: a(n) BlockClosure
>
> Process 0xb8837534 priority 60
> 0xb8837568 s SmalltalkImage>lowSpaceWatcher
> 0xb9127478 s [] in SmalltalkImage>installLowSpaceWatcher
> 0xb88374d4 s [] in BlockClosure>newProcess
>
> Most recent primitives
> relinquishProcessorForMicroseconds:
> relinquishProcessorForMicroseconds:
> relinquishProcessorForMicroseconds:
> relinquishProcessorForMicroseconds:
> ~ 200 times
>
>
>
>>
>> Nicolai
>>>
>>>
>>>
>>> 2014-07-25 16:56 GMT+02:00 Ben Coman <btc at openinworld.com>:
>>>
>>>>
>>>>  Over the last few days I have been looking deeper into the image
>>>> locking when suspending a process. It is an interesting rabbit hole [1]
>>>> that leads to pondering the Delay machinery, that leads to some VM
>>>> questions.
>>>>
>>>> When  pressing the interrupt key it seems to always opens the debugger
>>>> with the following call stack.
>>>> Semaphore>>critical:   'self wait'
>>>> BlockClosure>>ensure:     'self valueNoContextSwitch'
>>>> Semaphore>>critical:      'ensure: [ caught ifTrue: [self signal]]
>>>> Delay>>schedule         'AccessProtect critical: ['
>>>> Delay>>wait              'self schedule'
>>>> WorldState>>interCyclePause:
>>>>
>>>> I notice...
>>>>     Delay class >> initialize
>>>>         TimingSemaphore := (Smalltalk specialObjectsArray at: 30).
>>>> and...
>>>>     Delay class >> startTimerEventLoop
>>>>         TimingSemaphore := Semaphore new.
>>>> which seems incongruous that TimingSemaphore is set in differently.  So
>>>> while I presume this critical stuff all works fine, just in an exotic way,
>>>> my entropy-guarding-neuron would just like confirm this is so.
>>>>
>>>> --------------
>>>>
>>>> In Delay class >> handleTimerEvent the comment says...
>>>>     "Handle a timer event....
>>>>           -a timer signal (not explicitly specified)"
>>>> ...is that event perhaps a 'tick' generated periodically by the VM via
>>>> that item from specialObjectArray ?  Or is there some other mechanism ?
>>>>
>>>> --------------
>>>>
>>>> [1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole
>>>> cheers -ben
>>>>
>>>>
>>>> P.S. I've left the following for some initial context as I change the
>>>> subject.  btw Nicolai, I confirm that my proposed fixes only work on
>>>> Windows, not Mavericks (and I haven't checked Linux).
>>>>
>>>> Nicolai Hess wrote:
>>>>
>>>>
>>>> Hi ben, thank you for looking at this.
>>>>
>>>> 2014-07-22 20:17 GMT+02:00 <btc at openinworld.com>:
>>>>
>>>>>  I thought this might be interesting to learn, so I've gave it a go.
>>>>> I  had some success at the end, but I'll give a progressive report.
>>>>>
>>>>> First I thought I'd try moving the update of StringMorph outside the
>>>>> worker-process using a Morph's #step method as follows...
>>>>>
>>>>> Morph subclass: #BackgroundWorkDisplayMorph
>>>>>     instanceVariableNames: 'interProcessString stringMorph'
>>>>>     classVariableNames: ''
>>>>>     category: 'BenPlay'
>>>>>     "---------"
>>>>>
>>>>> BackgroundWorkDisplayMorph>>initializeMorph
>>>>>     self color: Color red.
>>>>>     stringMorph := StringMorph new.
>>>>>     self addMorphBack: stringMorph.
>>>>>     self extent:(300 at 50).
>>>>>     "---------"
>>>>>
>>>>> BackgroundWorkDisplayMorph>>newWorkerProcess
>>>>>     ^[
>>>>>         | work |
>>>>>         work := 0.
>>>>>         [     20 milliSeconds asDelay wait.
>>>>>             work := work + 1.
>>>>>             interProcessString := work asString.
>>>>>         ] repeat.
>>>>>     ] newProcess.
>>>>>     "---------"
>>>>>
>>>>> BackgroundWorkDisplayMorph>>step
>>>>>     stringMorph contents: interProcessString.
>>>>>     "---------"
>>>>>
>>>>> BackgroundWorkDisplayMorph>>stepTime
>>>>>     ^50
>>>>>     "---------"
>>>>>
>>>>> BackgroundWorkDisplayMorph>>initialize
>>>>>     | workerProcess running |
>>>>>     super initialize.
>>>>>     self initializeMorph.
>>>>>
>>>>>     workerProcess := self newWorkerProcess.
>>>>>     running := false.
>>>>>
>>>>>     self on: #mouseUp send: #value to:
>>>>>     [      (running := running not)
>>>>>             ifTrue: [  workerProcess resume. self color: Color green.
>>>>> ]
>>>>>             ifFalse: [ workerProcess suspend. self color: Color red. ]
>>>>>     ]
>>>>>     "---------"
>>>>>
>>>>>
>>>>>
>>>>> But evaluating "BackgroundWorkDisplayMorph new openInWorld"  found
>>>>> this exhibited the same problematic behavior you reported... Clicking on
>>>>> the morph worked a few times and then froze the UI until Cmd-. pressed a
>>>>> few times.
>>>>>
>>>>
>>>>> However I found the following never locked the GUI.
>>>>>
>>>>> BackgroundWorkDisplayMorph>>initialize
>>>>>     "BackgroundWorkDisplayMorph new openInWorld"
>>>>>     | workerProcess running |
>>>>>     super initialize.
>>>>>     self initializeMorph.
>>>>>
>>>>>     workerProcess := self newWorkerProcess.
>>>>>     running := false.
>>>>>
>>>>>     [ [      (running := running not)
>>>>>             ifTrue: [  workerProcess resume. self color: Color green  ]
>>>>>             ifFalse: [ workerProcess suspend. self color: Color red ].
>>>>>         10 milliSeconds asDelay wait.
>>>>>     ] repeat ] fork.
>>>>>     "---------"
>>>>>
>>>>>
>>>>  This locks the UI as well. Not every timet hough. I did this 5 times,
>>>> every time in a freshly loaded image and it happens two times.
>>>>
>>>>
>>>>
>>>>> So the problem seemed to not be with #suspend/#resume or with the
>>>>> shared variable /interProcessString/.  Indeed, since in the worker thread
>>>>> /interProcessString/ is atomically assigned a copy via #asString, and the
>>>>> String never updated, I think there is no need to surround use of it with a
>>>>> critical section.
>>>>>
>>>>> The solution then was to move the "#resume/#suspend" away from the
>>>>> "#on: #mouseUp send: #value to:" as follows...
>>>>>
>>>>> BackgroundWorkDisplayMorph>>initialize
>>>>>     "BackgroundWorkDisplayMorph new openInWorld"
>>>>>     | workerProcess running lastRunning |
>>>>>     super initialize.
>>>>>     self initializeMorph.
>>>>>
>>>>>     workerProcess := self newWorkerProcess.
>>>>>     lastRunning := running := false.
>>>>>
>>>>>     [ [    lastRunning = running ifFalse:
>>>>>         [    running
>>>>>                 ifTrue: [  workerProcess resume  ]
>>>>>                 ifFalse: [ workerProcess suspend ].
>>>>>             lastRunning := running.
>>>>>         ].
>>>>>         10 milliSeconds asDelay wait.
>>>>>     ] repeat ] fork.
>>>>>
>>>>>     self on: #mouseUp send: #value to:
>>>>>     [      (running := running not)
>>>>>             ifTrue: [  self color: Color green.  ]
>>>>>             ifFalse: [ self color: Color red. ]
>>>>>     ]
>>>>>     "---------"
>>>>>
>>>>
>>>>  And this too :(
>>>>
>>>>
>>>>
>>>>>
>>>>> And finally remove the busy loop.
>>>>>
>>>>> BackgroundWorkDisplayMorph>>initialize
>>>>>     "BackgroundWorkDisplayMorph new openInWorld"
>>>>>     | workerProcess running lastRunning semaphore |
>>>>>     super initialize.
>>>>>     self initializeMorph.
>>>>>
>>>>>     workerProcess := self newWorkerProcess.
>>>>>     lastRunning := running := false.
>>>>>     semaphore := Semaphore new.
>>>>>
>>>>>     [ [    semaphore wait.
>>>>>         running
>>>>>             ifTrue: [  workerProcess resume  ]
>>>>>             ifFalse: [ workerProcess suspend ].
>>>>>     ] repeat ] fork.
>>>>>
>>>>>     self on: #mouseUp send: #value to:
>>>>>     [      (running := running not)
>>>>>             ifTrue: [  self color: Color green.  ]
>>>>>             ifFalse: [ self color: Color red. ].
>>>>>         semaphore signal.
>>>>>     ]
>>>>>     "---------"
>>>>>
>>>>>
>>>>
>>>>  And this locks the UI too. (Loaded the code 20 times, every time
>>>> after a fresh image start up. Two times I got a locked
>>>>  ui after the first two clicks).
>>>>  And I don't understand this code :)
>>>>
>>>>
>>>>
>>>>> Now I can't say how close that is to how it "should" be done.  Its the
>>>>> first time I used sempahores and just what I discovered hacking around.
>>>>> But hey! it works :)
>>>>>
>>>>> cheers -ben
>>>>>
>>>>>
>>>>>
>>>>> Nicolai Hess wrote:
>>>>>
>>>>>  I am still struggling with it.
>>>>>
>>>>>  Any ideas?
>>>>>
>>>>>
>>>>> 2014-07-09 11:19 GMT+02:00 Nicolai Hess <nicolaihess at web.de>:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-07-09 2:07 GMT+02:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>>>
>>>>>>  Hi Nicolai,
>>>>>>>
>>>>>>>
>>>>>>>  On Tue, Jul 8, 2014 at 7:19 AM, Nicolai Hess <nicolaihess at web.de>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  I want to create a process doing some work and call #changed on a
>>>>>>>> Morph.
>>>>>>>> I want to start/suspend/resume or stop this process.
>>>>>>>> But sometimes, suspending the process locks the UI-Process,
>>>>>>>> and I don't know why. Did I miss something or do I have to care
>>>>>>>> when to call suspend?
>>>>>>>>
>>>>>>>>  Wrapping the "morph changed" call in
>>>>>>>>  UIManager default defer:[ morph changed].
>>>>>>>>  Does not change anything.
>>>>>>>>
>>>>>>>> Here is an example to reproduce it.
>>>>>>>> Create the process,
>>>>>>>> call resume, call supsend. It works, most of the time,
>>>>>>>> but sometimes, calling suspend locks the ui.
>>>>>>>>
>>>>>>>> p:=[[true] whileTrue:[ Transcript crShow: (DateAndTime now
>>>>>>>> asString). 30 milliSeconds asDelay wait]] newProcess.
>>>>>>>>
>>>>>>>  p resume.
>>>>>>>> p suspend.
>>>>>>>>
>>>>>>>
>>>>>>>   If you simply suspend this process at random form a user-priority
>>>>>>> process you'll never be able to damage the Delay machinery you're using,
>>>>>>> but chances are you'll suspend the process inside the critical section that
>>>>>>> Transcript uses to make itself thread-safe, and that'll lock up the
>>>>>>> Transcript.
>>>>>>>
>>>>>>
>>>>>>  Thank you Eliot
>>>>>>  yes I guessed it locks up the critical section, but I hoped with
>>>>>> would not happen if I the use UIManager defer call.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>  ThreadSafeTranscript>>nextPutAll: value
>>>>>>>   accessSemaphore
>>>>>>>  critical: [stream nextPutAll: value].
>>>>>>>  ^value
>>>>>>>
>>>>>>>  So instead you need to use a semaphore.  e.g.
>>>>>>>
>>>>>>>  | p s wait |
>>>>>>> s := Semaphore new.
>>>>>>> p:=[[true] whileTrue:[wait ifTrue: [s wait]. Transcript crShow:
>>>>>>> (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess.
>>>>>>> wait := true.
>>>>>>> 30 milliSeconds asDelay wait.
>>>>>>>  wait := false.
>>>>>>> s signal
>>>>>>>
>>>>>>>  etc...
>>>>>>>
>>>>>>
>>>>>>  Is this a common pattern I can find in pharos classes. Or I need
>>>>>> some help understanding this. The semaphore
>>>>>>  wait/signal is used instead of process resume/suspend?
>>>>>>
>>>>>> What I want is a process doing repeatly some computation,
>>>>>> calls or triggers an update on a morph, and I want to suspend and
>>>>>> resume this process.
>>>>>>
>>>>>> I would stop this discussion if someone tells me, "No your are doing
>>>>>> it wrong, go this way ..",  BUT what strikes me:
>>>>>> in this example, that reproduces my problem more closely:
>>>>>>
>>>>>> |p m s running|
>>>>>> running:=false.
>>>>>> m:=Morph new color:Color red.
>>>>>> s:= StringMorph new.
>>>>>> m addMorphBack:s.
>>>>>> p:=[[true]whileTrue:[20 milliSeconds asDelay wait. s
>>>>>> contents:(DateAndTime now asString). m changed]] newProcess.
>>>>>> m on:#mouseUp send:#value to:[
>>>>>>     running ifTrue:[p suspend. m color:Color red.]
>>>>>>     ifFalse:[p resume.m color:Color green.].
>>>>>>     running := running not].
>>>>>> m extent:(300 at 50).
>>>>>> m openInWorld
>>>>>>
>>>>>>
>>>>>>  clicking on the morph will stop or resume the process, if it locks
>>>>>> up I can still press alt+dot ->
>>>>>>  - a Debugger opens but the UI is still not responsive. I can click
>>>>>> with the mouse on the debuggers close icon.
>>>>>>  - nothing happens, as the UI is still blocked.
>>>>>>  - pressing alt+Dot again, the mouse click on the close icon is
>>>>>> processed and the first debugger window closes
>>>>>> - maybe other debuggers open.
>>>>>>
>>>>>> Repeating this steps, at some time the system is *fully* responsive
>>>>>> again!
>>>>>>  And miraculously, it works after that without further blockages.
>>>>>> What's happening here?
>>>>>>
>>>>>>
>>>>>>  Nicolai
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>  HTH
>>>>>>>
>>>>>>>   regards
>>>>>>>>  Nicolai
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> best,
>>>>>>> Eliot
>>>>>>>
>>>>>> --
>> Aloha,
>> Eliot
>>
>>
>
>


-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20140727/28034eea/attachment-0001.htm


More information about the Vm-dev mailing list