[Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

phil at highoctane.be phil at highoctane.be
Thu Mar 16 07:57:12 UTC 2017


There is more that Cairo callbacks in userland. I am using libstrophe for
Xmpp and there are tons of callbacks and not being able to use them is
putting Pharo out of the equation b/c the main loop is basically invoking
callbacks for each evenr type.

Phil

Le 16 mars 2017 00:44, "Eliot Miranda" <eliot.miranda at gmail.com> a écrit :

>
> Hi Igor,
>
> On Wed, Mar 15, 2017 at 2:53 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>
>>
>> Here's my d)
>> implement callback functions in C, or in native form => no need for
>> entering the smalltalk execution => no risk of GC => nothing to worry about.
>>
>> I guess nobody will like it (and will be right, of course ;) , but it is
>> how it was originally done. I used NativeBoost to implement those callback
>> functions and they're won't cause any GC problems.
>>
>
> yes, I like this.  I was wondering why the callbacks solution was used at
> all yesterday.  All they do is redirect to the cairo library.  What are the
> reasons?  Tedious to write and maintain the necessary simple plugin?
>
> Clément pointed out a really ugly problem with the current
> implementation.  If one calls back into Pharo from the BitBlt primitives
> and then reinvokes BitBlt, say by innocently putting a halt in those
> callbacks, then the original BitBlt's state will get overwritten by the
> BitBlt invocations in the callback's dynamic exert.  At least with my
> changes the BitBlt primitive will abort, rather than continue with the
> invalid state.
>
>
>> That, of course, gave me solution in this concrete case, but not in
>> general.. i.e. : if you have another callback that cannot be implemented
>> na(t)ively, then
>> you facing similar problems, mainly: how to work around the problem, that
>> primitive(s) that using callbacks may capture state, that are subject of GC
>> activity.
>>
>> In general , then, i think such primitive should be (re)written in such
>> way , that it won't get puzzled by GC.. and addGCRoot(s), IMO then best
>> way, from general interfacing/implementation standpoint.
>> I would just add extra interface for using it especially in primitives,
>> so that it
>> 1) won't punish primitive writer with too much coding
>> 2) automatically handle primitive/callback nesting e.g.
>> primitive1 -> adds roots1 -> calls fn -> callback -> st code ->
>> primitive2 -> adds roots2 -> calls fn2 -> callback2 ...
>>
>>
>> something like this:
>>
>> static initialized once myprimooptable = [ a,b,c].
>> vm pushPrimRoots: myooptable.
>> self do things primitive does.
>> vm popPrimRoots
>>
>> or, since we have green threading, then maybe better will be in this
>> form:
>>
>> rootsId := static initialized once myprimooptable = [ a,b,c].
>> vm pushPrimRoots: myooptable.
>> self do things primitive does.
>> vm popPrimRoots: rootsId.
>>
>
> We kind of have this with the addGCRoot: interface.  But I think it's much
> better to design the system so that the primitive fails and can be
> retried.  The problem there is having to have the primitive failure code
> check and roll back.  For example in the copyBits primitive one sees
>
> ((sourceForm isForm) and: [sourceForm unhibernate])
> ifTrue: [^ self copyBits].
> ((destForm isForm) and: [destForm unhibernate])
> ifTrue: [^ self copyBits].
> ((halftoneForm isForm) and: [halftoneForm unhibernate])
> ifTrue: [^ self copyBits].
>
> This is really scruffy because...GrafPort implements copyBits, so this
> ends up not just retrying the primitive but running a lot more besides.
> One way to write it is
>
>
> ((sourceForm isForm) and: [sourceForm unhibernate])
> ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass:
> BitBlt].
> ((destForm isForm) and: [destForm unhibernate])
> ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass:
> thisContext method methodClass].
> ((halftoneForm isForm) and: [halftoneForm unhibernate])
> ifTrue: [^ self perform: #copyBits withArguments: #() inSuperclass:
> thisContext methodClass].
>
> but that's ugly.
>
> A mechanism that was in the VM would be nice.  The state for the
> invocation is saved on the stack.  So there could be a special failure path
> for this kind of recursive invocation problem; another send-back such as
> doesNotUnderstand: attemptToReturn:through:.  Note (I'm sure you know this
> Igor)  that there are primitives such as the ThreadedFFIPlugin's call-out
> primitive that very much expect to be invoked recursively and have no
> problem with it.
>
>
>>
>>
>> On 14 March 2017 at 18:33, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <nicolaihess at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>>
>>>> 2017-03-14 16:46 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>>>
>>>>>
>>>>> Hi Esteban, Hi Igor, Hi All,
>>>>>
>>>>> On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <
>>>>> estebanlm at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I’m tumbling into an error in Pharo, because we use callbacks
>>>>>> intensively, in Athens(cairo)-to-World conversion in particular, and people
>>>>>> is sending always their crash reports… we made the whole conversion a lot
>>>>>> more robust since problems started to arise, but now I hit a wall I cannot
>>>>>> solve: I think problem is in something in callbacks.
>>>>>>
>>>>>> And problem is showing very easy on 64bits (while in 32bits it takes
>>>>>> time and is more random).
>>>>>>
>>>>>
>>>>>  I responded in the "image not opening" thread, but it's the same
>>>>> problem.  I really want to hear what y'all think because I'll happily
>>>>> implement a fix, but I want to know which one y'all think is a good idea.
>>>>> Here's my reply (edits between [ & ] to add information):
>>>>>
>>>>>
>>>>> On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <eliot.miranda at gmail.c
>>>>> om> wrote:
>>>>>
>>>>> I'm pretty confident [I know] this is to do with bugs in the Athens
>>>>> surface code which assumes that callbacks can be made in the existing
>>>>> copyBits and warpBits primitive.  They can't do this safely because a GC
>>>>> (scavenge) can happen during a callback, which then causes chaos when the
>>>>> copyBits primitive tries to access objects that have been moved under its
>>>>> feet.
>>>>>
>>>>> I've done work to fix callbacks so that when there is a failure it is
>>>>> the copyBits primitive that fails, instead of apparently the callback
>>>>> return primitive.  One of the apparent effects of this fix is to stop the
>>>>> screen opening up too small; another is getting the background colour
>>>>> right, and yet another is eliminating bogus pixels in the VGTigerDemo
>>>>> demo.  But more work is required to fix the copyBits and warpBits
>>>>> primitives.  There are a few approaches one might take:
>>>>>
>>>>> a)  fixing the primitive so that it saves and restores oops around the
>>>>> callbacks using the external oop table [InterpreterProxy>>addGCRoot: &
>>>>> removeGCRoot:].  That's a pain but possible. [It's a pain because all the
>>>>> derived pointers (the start of the destForm, sourceForm, halftoneForm and
>>>>> colorMapTable) must be recomputed also, and of course most of the time the
>>>>> objects don't move; we only scavenge about once every 2 seconds in normal
>>>>> running]
>>>>>
>>>>> b) fixing the primitive so that it pins the objects it needs before
>>>>> ever invoking a callback [this is a pain because pinning an object causes
>>>>> it to be tenured to old space if it is in new space; objects can't be
>>>>> pinned in new space, so instead the pin operation forwards the new space
>>>>> object to an old space copy if required and answers its location in old
>>>>> space, so a putative withPinnedObjectsDo: operation for the copyBits
>>>>> primitive looks like
>>>>> withPinnedFormsDo: aBlock
>>>>> <inline: #always>
>>>>> self cppIf: SPURVM & false
>>>>> ifTrue:
>>>>> [| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
>>>>>  (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
>>>>> [bitBltOop := interpreterProxy pinObject: bitBltOop].
>>>>> (destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
>>>>> [destForm := interpreterProxy pinObject: destForm].
>>>>> (sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
>>>>> [sourceForm := interpreterProxy pinObject: sourceForm].
>>>>> (halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
>>>>> [halftoneForm := interpreterProxy pinObject: halftoneForm].
>>>>> aBlock value.
>>>>>  bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
>>>>> destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
>>>>> sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
>>>>> halftoneWasPinned ifFalse: [interpreterProxy unpinObject:
>>>>> halftoneForm]]
>>>>> ifFalse: [aBlock value]
>>>>>    and tenuring objects to old space is not ideal because they are
>>>>> only collected by a full GC, so doing this would at least tenure the
>>>>> bitBltOop which is very likely to be in new space]
>>>>>
>>>>> c) fixing the primitive so that it uses the scavenge and fullGC
>>>>> counters in the VM to detect if a GC occurred during one of the callbacks
>>>>> and would fail the primitive [if it detected that a GC had occurred in any
>>>>> of the surface functions].   The primitive would then simply be retried.
>>>>>
>>>>> d) ?
>>>>>
>>>>
>>>> Wouldn't it be possible to just pause the GC (scavange) when entering a
>>>> primitive ?
>>>>
>>>
>>> I don't think so.  There is a callback occurring.  If the computation
>>> executed by the callback requires a GC the application will abort if a GC
>>> cannot be done.  Right?  This is the case here.
>>>
>>>
>>> I like c) as it's very lightweight, but it has issues.  It is fine to
>>>>> use for callbacks *before* cop[yBits and warpBits move any bits (the
>>>>> lockSurface and querySurface functions).  But it's potentially erroneous
>>>>> after the unlockSurface primitive.  For example, a primitive which does an
>>>>> xor with the screen can't simply be retried as the first, falling pass,
>>>>> would have updated the destination bits but not displayed them via
>>>>> unlockSurface.  But I think it could be arranged that no objects are
>>>>> accessed after unlockSurface, which should naturally be the last call in
>>>>> the primitive (or do I mean showSurface?).  So the approach would be to
>>>>> check for GCs occurring during querySurface and lockSurface, failing if so,
>>>>> and then caching any and all state needed by unlockSurface and showSurface
>>>>> in local variables.  This way no object state is accessed to make the
>>>>> unlockSurface and showSurface calls, and no bits are moved before the
>>>>> queryDurface and lockSurface calls.
>>>>>
>>>>> If we used a failure code such as #'object may move' then the
>>>>> primitives could answer this when a GC during callbacks is detected and
>>>>> then the primitive could be retried only when required.
>>>>>
>>>>>
>>>>> [Come on folks, please comment.  I want to know which idea you like
>>>>> best.  We could fix this quickly.  But right now it feels like I'm talking
>>>>> to myself.]
>>>>>
>>>>>
>>>>>> Here is the easiest way to reproduce it (in mac):
>>>>>>
>>>>>> wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
>>>>>> wget files.pharo.org/get-files/60/pharo64.zip
>>>>>> wget files.pharo.org/get-files/60/sources.zip
>>>>>> unzip pharo64-mac-latest.zip
>>>>>> unzip pharo64.zip
>>>>>> unzip sources.zip
>>>>>> ./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval
>>>>>> "VGTigerDemo runDemo"
>>>>>>
>>>>>> eventually (like 5-6 seconds after, if not immediately), you will
>>>>>> have a stack like this:
>>>>>>
>>>>>> SmallInteger(Object)>>primitiveFailed:
>>>>>> SmallInteger(Object)>>primitiveFailed
>>>>>> SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fr
>>>>>> omContext:
>>>>>> GrafPort>>copyBits
>>>>>> GrafPort>>image:at:sourceRect:rule:
>>>>>> FormCanvas>>image:at:sourceRect:rule:
>>>>>> FormCanvas(Canvas)>>drawImage:at:sourceRect:
>>>>>> FormCanvas(Canvas)>>drawImage:at:
>>>>>> VGTigerDemo>>runDemo
>>>>>> VGTigerDemo class>>runDemo
>>>>>> UndefinedObject>>DoIt
>>>>>> OpalCompiler>>evaluate
>>>>>> OpalCompiler(AbstractCompiler)>>evaluate:
>>>>>> [ result := Smalltalk compiler evaluate: aStream.
>>>>>> self hasSessionChanged
>>>>>>         ifFalse: [ self stdout
>>>>>>                         print: result;
>>>>>>                         lf ] ] in EvaluateCommandLineHandler>>evaluate:
>>>>>> in Block: [ result := Smalltalk compiler evaluate: aStream....
>>>>>> BlockClosure>>on:do:
>>>>>> EvaluateCommandLineHandler>>evaluate:
>>>>>> EvaluateCommandLineHandler>>evaluateArguments
>>>>>> EvaluateCommandLineHandler>>activate
>>>>>> EvaluateCommandLineHandler class(CommandLineHandler
>>>>>> class)>>activateWith:
>>>>>> [ aCommandLinehandler activateWith: commandLine ] in
>>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>>>> in Block: [ aCommandLinehandler activateWith: commandLine ]
>>>>>> BlockClosure>>on:do:
>>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
>>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
>>>>>> [ self
>>>>>>         handleArgument:
>>>>>>                 (self arguments
>>>>>>                         ifEmpty: [ '' ]
>>>>>>                         ifNotEmpty: [ :arguments | arguments first ])
>>>>>> ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in
>>>>>> Block: [ self...
>>>>>> BlockClosure>>on:do:
>>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
>>>>>> PharoCommandLineHandler>>activate
>>>>>> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
>>>>>> [ super activateWith: aCommandLine ] in PharoCommandLineHandler
>>>>>> class>>activateWith: in Block: [ super activateWith: aCommandLine ]
>>>>>>
>>>>>> Any idea?
>>>>>>
>>>>>> thanks!
>>>>>> Esteban
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> _,,,^..^,,,_
>>>>> best, Eliot
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>>
>>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170316/f8148f20/attachment-0001.html>


More information about the Vm-dev mailing list