[Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Igor Stasenko siguctua at gmail.com
Wed Mar 15 10:03:42 UTC 2017


On 15 March 2017 at 11:53, Igor Stasenko <siguctua at gmail.com> wrote:

> Here's my d)
> implement callback functions in C, or in native form => no need for
> entering the smalltalk execution => no risk of GC => nothing to worry about.
>
> I guess nobody will like it (and will be right, of course ;) , but it is
> how it was originally done. I used NativeBoost to implement those callback
> functions and they're won't cause any GC problems.
>
> That, of course, gave me solution in this concrete case, but not in
> general.. i.e. : if you have another callback that cannot be implemented
> na(t)ively, then
> you facing similar problems, mainly: how to work around the problem, that
> primitive(s) that using callbacks may capture state, that are subject of GC
> activity.
>
> In general , then, i think such primitive should be (re)written in such
> way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best
> way, from general interfacing/implementation standpoint.
> I would just add extra interface for using it especially in primitives, so
> that it
> 1) won't punish primitive writer with too much coding
> 2) automatically handle primitive/callback nesting e.g.
> primitive1 -> adds roots1 -> calls fn -> callback -> st code -> primitive2
> -> adds roots2 -> calls fn2 -> callback2 ...
>
>
> something like this:
>
> static initialized once myprimooptable = [ a,b,c].
> vm pushPrimRoots: myooptable.
> self do things primitive does.
> vm popPrimRoots
>
> or, since we have green threading, then maybe better will be in this form:
>
> rootsId := static initialized once myprimooptable = [ a,b,c].
> vm pushPrimRoots: myooptable.
> self do things primitive does.
> vm popPrimRoots: rootsId.
>
> oops, i meant:

static initialized once myprimooptable = [ a,b,c].
rootsId := vm pushPrimRoots: myooptable.
self do things primitive does.
vm popPrimRoots: rootsId.

the idea is mainly, that VM answers you some key, using which you can later
identify, what to release.

>
>
> On 14 March 2017 at 18:33, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
>>
>>
>>
>> On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <nicolaihess at gmail.com>
>> wrote:
>>
>>>
>>>
>>>
>>> 2017-03-14 16:46 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>>
>>>>
>>>> Hi Esteban, Hi Igor, Hi All,
>>>>
>>>> On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <estebanlm at gmail.com
>>>> > wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I’m tumbling into an error in Pharo, because we use callbacks
>>>>> intensively, in Athens(cairo)-to-World conversion in particular, and people
>>>>> is sending always their crash reports… we made the whole conversion a lot
>>>>> more robust since problems started to arise, but now I hit a wall I cannot
>>>>> solve: I think problem is in something in callbacks.
>>>>>
>>>>> And problem is showing very easy on 64bits (while in 32bits it takes
>>>>> time and is more random).
>>>>>
>>>>
>>>>  I responded in the "image not opening" thread, but it's the same
>>>> problem.  I really want to hear what y'all think because I'll happily
>>>> implement a fix, but I want to know which one y'all think is a good idea.
>>>> Here's my reply (edits between [ & ] to add information):
>>>>
>>>>
>>>> On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <eliot.miranda at gmail.com
>>>> > wrote:
>>>>
>>>> I'm pretty confident [I know] this is to do with bugs in the Athens
>>>> surface code which assumes that callbacks can be made in the existing
>>>> copyBits and warpBits primitive.  They can't do this safely because a GC
>>>> (scavenge) can happen during a callback, which then causes chaos when the
>>>> copyBits primitive tries to access objects that have been moved under its
>>>> feet.
>>>>
>>>> I've done work to fix callbacks so that when there is a failure it is
>>>> the copyBits primitive that fails, instead of apparently the callback
>>>> return primitive.  One of the apparent effects of this fix is to stop the
>>>> screen opening up too small; another is getting the background colour
>>>> right, and yet another is eliminating bogus pixels in the VGTigerDemo
>>>> demo.  But more work is required to fix the copyBits and warpBits
>>>> primitives.  There are a few approaches one might take:
>>>>
>>>> a)  fixing the primitive so that it saves and restores oops around the
>>>> callbacks using the external oop table [InterpreterProxy>>addGCRoot: &
>>>> removeGCRoot:].  That's a pain but possible. [It's a pain because all the
>>>> derived pointers (the start of the destForm, sourceForm, halftoneForm and
>>>> colorMapTable) must be recomputed also, and of course most of the time the
>>>> objects don't move; we only scavenge about once every 2 seconds in normal
>>>> running]
>>>>
>>>> b) fixing the primitive so that it pins the objects it needs before
>>>> ever invoking a callback [this is a pain because pinning an object causes
>>>> it to be tenured to old space if it is in new space; objects can't be
>>>> pinned in new space, so instead the pin operation forwards the new space
>>>> object to an old space copy if required and answers its location in old
>>>> space, so a putative withPinnedObjectsDo: operation for the copyBits
>>>> primitive looks like
>>>> withPinnedFormsDo: aBlock
>>>> <inline: #always>
>>>> self cppIf: SPURVM & false
>>>> ifTrue:
>>>> [| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
>>>>  (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
>>>> [bitBltOop := interpreterProxy pinObject: bitBltOop].
>>>> (destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
>>>> [destForm := interpreterProxy pinObject: destForm].
>>>> (sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
>>>> [sourceForm := interpreterProxy pinObject: sourceForm].
>>>> (halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
>>>> [halftoneForm := interpreterProxy pinObject: halftoneForm].
>>>> aBlock value.
>>>>  bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
>>>> destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
>>>> sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
>>>> halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
>>>> ifFalse: [aBlock value]
>>>>    and tenuring objects to old space is not ideal because they are only
>>>> collected by a full GC, so doing this would at least tenure the bitBltOop
>>>> which is very likely to be in new space]
>>>>
>>>> c) fixing the primitive so that it uses the scavenge and fullGC
>>>> counters in the VM to detect if a GC occurred during one of the callbacks
>>>> and would fail the primitive [if it detected that a GC had occurred in any
>>>> of the surface functions].   The primitive would then simply be retried.
>>>>
>>>> d) ?
>>>>
>>>
>>> Wouldn't it be possible to just pause the GC (scavange) when entering a
>>> primitive ?
>>>
>>
>> I don't think so.  There is a callback occurring.  If the computation
>> executed by the callback requires a GC the application will abort if a GC
>> cannot be done.  Right?  This is the case here.
>>
>>
>> I like c) as it's very lightweight, but it has issues.  It is fine to use
>>>> for callbacks *before* cop[yBits and warpBits move any bits (the
>>>> lockSurface and querySurface functions).  But it's potentially erroneous
>>>> after the unlockSurface primitive.  For example, a primitive which does an
>>>> xor with the screen can't simply be retried as the first, falling pass,
>>>> would have updated the destination bits but not displayed them via
>>>> unlockSurface.  But I think it could be arranged that no objects are
>>>> accessed after unlockSurface, which should naturally be the last call in
>>>> the primitive (or do I mean showSurface?).  So the approach would be to
>>>> check for GCs occurring during querySurface and lockSurface, failing if so,
>>>> and then caching any and all state needed by unlockSurface and showSurface
>>>> in local variables.  This way no object state is accessed to make the
>>>> unlockSurface and showSurface calls, and no bits are moved before the
>>>> queryDurface and lockSurface calls.
>>>>
>>>> If we used a failure code such as #'object may move' then the
>>>> primitives could answer this when a GC during callbacks is detected and
>>>> then the primitive could be retried only when required.
>>>>
>>>>
>>>> [Come on folks, please comment.  I want to know which idea you like
>>>> best.  We could fix this quickly.  But right now it feels like I'm talking
>>>> to myself.]
>>>>
>>>>
>>>>> Here is the easiest way to reproduce it (in mac):
>>>>>
>>>>> wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
>>>>> wget files.pharo.org/get-files/60/pharo64.zip
>>>>> wget files.pharo.org/get-files/60/sources.zip
>>>>> unzip pharo64-mac-latest.zip
>>>>> unzip pharo64.zip
>>>>> unzip sources.zip
>>>>> ./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval
>>>>> "VGTigerDemo runDemo"
>>>>>
>>>>> eventually (like 5-6 seconds after, if not immediately), you will have
>>>>> a stack like this:
>>>>>
>>>>> SmallInteger(Object)>>primitiveFailed:
>>>>> SmallInteger(Object)>>primitiveFailed
>>>>> SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
>>>>> GrafPort>>copyBits
>>>>> GrafPort>>image:at:sourceRect:rule:
>>>>> FormCanvas>>image:at:sourceRect:rule:
>>>>> FormCanvas(Canvas)>>drawImage:at:sourceRect:
>>>>> FormCanvas(Canvas)>>drawImage:at:
>>>>> VGTigerDemo>>runDemo
>>>>> VGTigerDemo class>>runDemo
>>>>> UndefinedObject>>DoIt
>>>>> OpalCompiler>>evaluate
>>>>> OpalCompiler(AbstractCompiler)>>evaluate:
>>>>> [ result := Smalltalk compiler evaluate: aStream.
>>>>> self hasSessionChanged
>>>>>         ifFalse: [ self stdout
>>>>>                         print: result;
>>>>>                         lf ] ] in EvaluateCommandLineHandler>>evaluate:
>>>>> in Block: [ result := Smalltalk compiler evaluate: aStream....
>>>>> BlockClosure>>on:do:
>>>>> EvaluateCommandLineHandler>>evaluate:
>>>>> EvaluateCommandLineHandler>>evaluateArguments
>>>>> EvaluateCommandLineHandler>>activate
>>>>> EvaluateCommandLineHandler class(CommandLineHandler
>>>>> class)>>activateWith:
>>>>> [ aCommandLinehandler activateWith: commandLine ] in
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>>> in Block: [ aCommandLinehandler activateWith: commandLine ]
>>>>> BlockClosure>>on:do:
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
>>>>> [ self
>>>>>         handleArgument:
>>>>>                 (self arguments
>>>>>                         ifEmpty: [ '' ]
>>>>>                         ifNotEmpty: [ :arguments | arguments first ])
>>>>> ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in
>>>>> Block: [ self...
>>>>> BlockClosure>>on:do:
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
>>>>> PharoCommandLineHandler>>activate
>>>>> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
>>>>> [ super activateWith: aCommandLine ] in PharoCommandLineHandler
>>>>> class>>activateWith: in Block: [ super activateWith: aCommandLine ]
>>>>>
>>>>> Any idea?
>>>>>
>>>>> thanks!
>>>>> Esteban
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> _,,,^..^,,,_
>>>> best, Eliot
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
>>
>>
>
>
> --
> Best regards,
> Igor Stasenko.
>



-- 
Best regards,
Igor Stasenko.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170315/f855bb9c/attachment-0001.html>


More information about the Vm-dev mailing list