[Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Clément Bera bera.clement at gmail.com
Tue Mar 14 16:20:38 UTC 2017

On Tue, Mar 14, 2017 at 8:57 AM, Nicolai Hess <nicolaihess at gmail.com> wrote:

> 2017-03-14 16:56 GMT+01:00 Nicolai Hess <nicolaihess at gmail.com>:
>> 2017-03-14 16:46 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:
>>> Hi Esteban, Hi Igor, Hi All,
>>> On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <estebanlm at gmail.com>
>>> wrote:
>>>> Hi,
>>>> I’m tumbling into an error in Pharo, because we use callbacks
>>>> intensively, in Athens(cairo)-to-World conversion in particular, and people
>>>> is sending always their crash reports… we made the whole conversion a lot
>>>> more robust since problems started to arise, but now I hit a wall I cannot
>>>> solve: I think problem is in something in callbacks.
>>>> And problem is showing very easy on 64bits (while in 32bits it takes
>>>> time and is more random).
>>>  I responded in the "image not opening" thread, but it's the same
>>> problem.  I really want to hear what y'all think because I'll happily
>>> implement a fix, but I want to know which one y'all think is a good idea.
>>> Here's my reply (edits between [ & ] to add information):
>>> On Mon, Mar 13, 2017 at 9:11 PM, Eliot Miranda <eliot.miranda at gmail.com>
>>>  wrote:
>>> I'm pretty confident [I know] this is to do with bugs in the Athens
>>> surface code which assumes that callbacks can be made in the existing
>>> copyBits and warpBits primitive.  They can't do this safely because a GC
>>> (scavenge) can happen during a callback, which then causes chaos when the
>>> copyBits primitive tries to access objects that have been moved under its
>>> feet.
>>> I've done work to fix callbacks so that when there is a failure it is
>>> the copyBits primitive that fails, instead of apparently the callback
>>> return primitive.  One of the apparent effects of this fix is to stop the
>>> screen opening up too small; another is getting the background colour
>>> right, and yet another is eliminating bogus pixels in the VGTigerDemo
>>> demo.  But more work is required to fix the copyBits and warpBits
>>> primitives.  There are a few approaches one might take:
>>> a)  fixing the primitive so that it saves and restores oops around the
>>> callbacks using the external oop table [InterpreterProxy>>addGCRoot: &
>>> removeGCRoot:].  That's a pain but possible. [It's a pain because all the
>>> derived pointers (the start of the destForm, sourceForm, halftoneForm and
>>> colorMapTable) must be recomputed also, and of course most of the time the
>>> objects don't move; we only scavenge about once every 2 seconds in normal
>>> running]
>>> b) fixing the primitive so that it pins the objects it needs before ever
>>> invoking a callback [this is a pain because pinning an object causes it to
>>> be tenured to old space if it is in new space; objects can't be pinned in
>>> new space, so instead the pin operation forwards the new space object to an
>>> old space copy if required and answers its location in old space, so a
>>> putative withPinnedObjectsDo: operation for the copyBits primitive looks
>>> like
>>> withPinnedFormsDo: aBlock
>>> <inline: #always>
>>> self cppIf: SPURVM & false
>>> ifTrue:
>>> [| bitBltOopWasPinned destWasPinned sourceWasPinned halftoneWasPinned |
>>>  (bitBltOopWasPinned := interpreterProxy isPinned: bitBltOop) ifFalse:
>>> [bitBltOop := interpreterProxy pinObject: bitBltOop].
>>> (destWasPinned := interpreterProxy isPinned: destForm) ifFalse:
>>> [destForm := interpreterProxy pinObject: destForm].
>>> (sourceWasPinned := interpreterProxy isPinned: sourceForm) ifFalse:
>>> [sourceForm := interpreterProxy pinObject: sourceForm].
>>> (halftoneWasPinned := interpreterProxy isPinned: halftoneForm) ifFalse:
>>> [halftoneForm := interpreterProxy pinObject: halftoneForm].
>>> aBlock value.
>>>  bitBltOopWasPinned ifFalse: [interpreterProxy unpinObject: bitBltOop].
>>> destWasPinned ifFalse: [interpreterProxy unpinObject: destForm].
>>> sourceWasPinned ifFalse: [interpreterProxy unpinObject: sourceForm].
>>> halftoneWasPinned ifFalse: [interpreterProxy unpinObject: halftoneForm]]
>>> ifFalse: [aBlock value]
>>>    and tenuring objects to old space is not ideal because they are only
>>> collected by a full GC, so doing this would at least tenure the bitBltOop
>>> which is very likely to be in new space]
>>> c) fixing the primitive so that it uses the scavenge and fullGC counters
>>> in the VM to detect if a GC occurred during one of the callbacks and would
>>> fail the primitive [if it detected that a GC had occurred in any of the
>>> surface functions].   The primitive would then simply be retried.
That's clearly the best solution unless someone figures out a better d)

> d) ?
>> Wouldn't it be possible to just pause the GC (scavange) when entering a
>> primitive ?
> Wouldn't it be possible to just *disable* the GC (scavange) when entering
> a primitive ?

If you disable only the scavenges and some of the objects are in old space
and a fullGC happens, then the same problem happens.

During the call-backs, an infinite amount of objects can be allocated.
Where do you allocate such objects if you disable the GC ?

>>> I like c) as it's very lightweight, but it has issues.  It is fine to
>>> use for callbacks *before* cop[yBits and warpBits move any bits (the
>>> lockSurface and querySurface functions).  But it's potentially erroneous
>>> after the unlockSurface primitive.  For example, a primitive which does an
>>> xor with the screen can't simply be retried as the first, falling pass,
>>> would have updated the destination bits but not displayed them via
>>> unlockSurface.  But I think it could be arranged that no objects are
>>> accessed after unlockSurface, which should naturally be the last call in
>>> the primitive (or do I mean showSurface?).  So the approach would be to
>>> check for GCs occurring during querySurface and lockSurface, failing if so,
>>> and then caching any and all state needed by unlockSurface and showSurface
>>> in local variables.  This way no object state is accessed to make the
>>> unlockSurface and showSurface calls, and no bits are moved before the
>>> queryDurface and lockSurface calls.
>>> If we used a failure code such as #'object may move' then the primitives
>>> could answer this when a GC during callbacks is detected and then the
>>> primitive could be retried only when required.
>>> [Come on folks, please comment.  I want to know which idea you like
>>> best.  We could fix this quickly.  But right now it feels like I'm talking
>>> to myself.]
>>>> Here is the easiest way to reproduce it (in mac):
>>>> wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
>>>> wget files.pharo.org/get-files/60/pharo64.zip
>>>> wget files.pharo.org/get-files/60/sources.zip
>>>> unzip pharo64-mac-latest.zip
>>>> unzip pharo64.zip
>>>> unzip sources.zip
>>>> ./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval
>>>> "VGTigerDemo runDemo"
>>>> eventually (like 5-6 seconds after, if not immediately), you will have
>>>> a stack like this:
>>>> SmallInteger(Object)>>primitiveFailed:
>>>> SmallInteger(Object)>>primitiveFailed
>>>> SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
>>>> GrafPort>>copyBits
>>>> GrafPort>>image:at:sourceRect:rule:
>>>> FormCanvas>>image:at:sourceRect:rule:
>>>> FormCanvas(Canvas)>>drawImage:at:sourceRect:
>>>> FormCanvas(Canvas)>>drawImage:at:
>>>> VGTigerDemo>>runDemo
>>>> VGTigerDemo class>>runDemo
>>>> UndefinedObject>>DoIt
>>>> OpalCompiler>>evaluate
>>>> OpalCompiler(AbstractCompiler)>>evaluate:
>>>> [ result := Smalltalk compiler evaluate: aStream.
>>>> self hasSessionChanged
>>>>         ifFalse: [ self stdout
>>>>                         print: result;
>>>>                         lf ] ] in EvaluateCommandLineHandler>>evaluate:
>>>> in Block: [ result := Smalltalk compiler evaluate: aStream....
>>>> BlockClosure>>on:do:
>>>> EvaluateCommandLineHandler>>evaluate:
>>>> EvaluateCommandLineHandler>>evaluateArguments
>>>> EvaluateCommandLineHandler>>activate
>>>> EvaluateCommandLineHandler class(CommandLineHandler
>>>> class)>>activateWith:
>>>> [ aCommandLinehandler activateWith: commandLine ] in
>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>> in Block: [ aCommandLinehandler activateWith: commandLine ]
>>>> BlockClosure>>on:do:
>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
>>>> [ self
>>>>         handleArgument:
>>>>                 (self arguments
>>>>                         ifEmpty: [ '' ]
>>>>                         ifNotEmpty: [ :arguments | arguments first ]) ]
>>>> in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in
>>>> Block: [ self...
>>>> BlockClosure>>on:do:
>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
>>>> PharoCommandLineHandler>>activate
>>>> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
>>>> [ super activateWith: aCommandLine ] in PharoCommandLineHandler
>>>> class>>activateWith: in Block: [ super activateWith: aCommandLine ]
>>>> Any idea?
>>>> thanks!
>>>> Esteban
>>> --
>>> _,,,^..^,,,_
>>> best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170314/a55626fe/attachment-0001.html>

More information about the Vm-dev mailing list