[Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda eliot.miranda at gmail.com
Tue Mar 14 16:33:50 UTC 2017

On Tue, Mar 14, 2017 at 8:56 AM, Nicolai Hess <nicolaihess at gmail.com> wrote:

> 2017-03-14 16:46 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:
>> Hi Esteban, Hi Igor, Hi All,
>> On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <estebanlm at gmail.com>
>> wrote:
>>> Hi,
>>> I’m tumbling into an error in Pharo, because we use callbacks
>>> intensively, in Athens(cairo)-to-World conversion in particular, and people
>>> is sending always their crash reports… we made the whole conversion a lot
>>> more robust since problems started to arise, but now I hit a wall I cannot
>>> solve: I think problem is in something in callbacks.
>>> And problem is showing very easy on 64bits (while in 32bits it takes
>>> time and is more random).
>>  I responded in the "image not opening" thread, but it's the same
>> problem.  I really want to hear what y'all think because I'll happily
>> implement a fix, but I want to know which one y'all think is a good idea.
>> Here's my reply (edits between [ & ] to add information):
>> On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <eliot.miranda at gmail.com>
>>  wrote:
>> I'm pretty confident [I know] this is to do with bugs in the Athens
>> surface code which assumes that callbacks can be made in the existing
>> copyBits and warpBits primitive.  They can't do this safely because a GC
>> (scavenge) can happen during a callback, which then causes chaos when the
>> copyBits primitive tries to access objects that have been moved under its
>> feet.
>> I've done work to fix callbacks so that when there is a failure it is the
>> copyBits primitive that fails, instead of apparently the callback return
>> primitive.  One of the apparent effects of this fix is to stop the screen
>> opening up too small; another is getting the background colour right, and
>> yet another is eliminating bogus pixels in the VGTigerDemo demo.  But more
>> work is required to fix the copyBits and warpBits primitives.  There are a
>> few approaches one might take:
>> a)  fixing the primitive so that it saves and restores oops around the
>> callbacks using the external oop table [InterpreterProxy>>addGCRoot: &
>> removeGCRoot:].  That's a pain but possible. [It's a pain because all the
>> derived pointers (the start of the destForm, sourceForm, halftoneForm and
>> colorMapTable) must be recomputed also, and of course most of the time the
>> objects don't move; we only scavenge about once every 2 seconds in normal
>> running]
>> b) fixing the primitive so that it pins the objects it needs before ever
>> invoking a callback [this is a pain because pinning an object causes it to
>> be tenured to old space if it is in new space; objects can't be pinned in
>> new space, so instead the pin operation forwards the new space object to an
>> old space copy if required and answers its location in old space, so a
>> putative withPinnedObjectsDo: operation for the copyBits primitive looks
>> like
>> withPinnedFormsDo: aBlock
>> <inline: #always>
>> self cppIf: SPURVM & false
>> ifTrue:
>> [| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
>>  (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
>> [bitBltOop := interpreterProxy pinObject: bitBltOop].
>> (destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
>> [destForm := interpreterProxy pinObject: destForm].
>> (sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
>> [sourceForm := interpreterProxy pinObject: sourceForm].
>> (halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
>> [halftoneForm := interpreterProxy pinObject: halftoneForm].
>> aBlock value.
>>  bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
>> destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
>> sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
>> halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
>> ifFalse: [aBlock value]
>>    and tenuring objects to old space is not ideal because they are only
>> collected by a full GC, so doing this would at least tenure the bitBltOop
>> which is very likely to be in new space]
>> c) fixing the primitive so that it uses the scavenge and fullGC counters
>> in the VM to detect if a GC occurred during one of the callbacks and would
>> fail the primitive [if it detected that a GC had occurred in any of the
>> surface functions].   The primitive would then simply be retried.
>> d) ?
> Wouldn't it be possible to just pause the GC (scavange) when entering a
> primitive ?

I don't think so.  There is a callback occurring.  If the computation
executed by the callback requires a GC the application will abort if a GC
cannot be done.  Right?  This is the case here.

I like c) as it's very lightweight, but it has issues.  It is fine to use
>> for callbacks *before* cop[yBits and warpBits move any bits (the
>> lockSurface and querySurface functions).  But it's potentially erroneous
>> after the unlockSurface primitive.  For example, a primitive which does an
>> xor with the screen can't simply be retried as the first, falling pass,
>> would have updated the destination bits but not displayed them via
>> unlockSurface.  But I think it could be arranged that no objects are
>> accessed after unlockSurface, which should naturally be the last call in
>> the primitive (or do I mean showSurface?).  So the approach would be to
>> check for GCs occurring during querySurface and lockSurface, failing if so,
>> and then caching any and all state needed by unlockSurface and showSurface
>> in local variables.  This way no object state is accessed to make the
>> unlockSurface and showSurface calls, and no bits are moved before the
>> queryDurface and lockSurface calls.
>> If we used a failure code such as #'object may move' then the primitives
>> could answer this when a GC during callbacks is detected and then the
>> primitive could be retried only when required.
>> [Come on folks, please comment.  I want to know which idea you like
>> best.  We could fix this quickly.  But right now it feels like I'm talking
>> to myself.]
>>> Here is the easiest way to reproduce it (in mac):
>>> wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
>>> wget files.pharo.org/get-files/60/pharo64.zip
>>> wget files.pharo.org/get-files/60/sources.zip
>>> unzip pharo64-mac-latest.zip
>>> unzip pharo64.zip
>>> unzip sources.zip
>>> ./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval
>>> "VGTigerDemo runDemo"
>>> eventually (like 5-6 seconds after, if not immediately), you will have a
>>> stack like this:
>>> SmallInteger(Object)>>primitiveFailed:
>>> SmallInteger(Object)>>primitiveFailed
>>> SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
>>> GrafPort>>copyBits
>>> GrafPort>>image:at:sourceRect:rule:
>>> FormCanvas>>image:at:sourceRect:rule:
>>> FormCanvas(Canvas)>>drawImage:at:sourceRect:
>>> FormCanvas(Canvas)>>drawImage:at:
>>> VGTigerDemo>>runDemo
>>> VGTigerDemo class>>runDemo
>>> UndefinedObject>>DoIt
>>> OpalCompiler>>evaluate
>>> OpalCompiler(AbstractCompiler)>>evaluate:
>>> [ result := Smalltalk compiler evaluate: aStream.
>>> self hasSessionChanged
>>>         ifFalse: [ self stdout
>>>                         print: result;
>>>                         lf ] ] in EvaluateCommandLineHandler>>evaluate:
>>> in Block: [ result := Smalltalk compiler evaluate: aStream....
>>> BlockClosure>>on:do:
>>> EvaluateCommandLineHandler>>evaluate:
>>> EvaluateCommandLineHandler>>evaluateArguments
>>> EvaluateCommandLineHandler>>activate
>>> EvaluateCommandLineHandler class(CommandLineHandler class)>>activateWith:
>>> [ aCommandLinehandler activateWith: commandLine ] in
>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>> in Block: [ aCommandLinehandler activateWith: commandLine ]
>>> BlockClosure>>on:do:
>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
>>> [ self
>>>         handleArgument:
>>>                 (self arguments
>>>                         ifEmpty: [ '' ]
>>>                         ifNotEmpty: [ :arguments | arguments first ]) ]
>>> in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in Block:
>>> [ self...
>>> BlockClosure>>on:do:
>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
>>> PharoCommandLineHandler>>activate
>>> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
>>> [ super activateWith: aCommandLine ] in PharoCommandLineHandler
>>> class>>activateWith: in Block: [ super activateWith: aCommandLine ]
>>> Any idea?
>>> thanks!
>>> Esteban
>> --
>> _,,,^..^,,,_
>> best, Eliot

best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170314/0a4dd907/attachment.html>

More information about the Vm-dev mailing list