[Vm-dev] BUG? A problem with callbacks that shows up in 64bits (but is on 32bits too)

Eliot Miranda eliot.miranda at gmail.com
Sun Mar 12 19:56:50 UTC 2017


Hi Nicolas, Hi Esteban,

On Sun, Mar 12, 2017 at 6:31 AM, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
> Ah, and the example does not work better in 32bits pharo, so no need to
> invoke a 64 bits problem.
>
> Also, at image startup, there's a gray background. And last time I
> restarted the 32bits image, here is the kind of artefacts I got at first
> menu click:
>
> [image: Images intégrées 1]
>
> IMO, all your image startup problem (like crash at startup) are related.
>

[At the end of my message is something important for the author and
maintainers of AthensCairoSurface class; if you're that person please read
the end of my message, point c) below].


I agree.  I think that callbacks are very important.  Yesterday I
reimplemented ownVM: and disownVM: for the non-threaded standard
configuration (our current VMs) to preserve the argumentCount variable
across the callback.  argumentCount is used by many primitives (via
interpreterProxy methodArgumentCount) to cut-back the arguments on
returning from the primitive (other primitives use a consent if they know
what their argument count is).  Without doing this the argumentCount
variable will be changed by the interpreter sends or interpreter primitives
executed during a callback, and on return may be completely different from
the value it had in the primitive that executed the callback(s).  Hence
when the primitive that executes a callback tries to return it may pop the
wrong number pif values off of the stack.

I fixed this so that ownVM, called when a callback is initiated, remembers
the argumentCount and answers it encoded as the result of ownVM.  diwownVM
is called when the callback returns to C and is given the result of ownVM
as an argument.  So disownVM restores argumentCount to its value before the
callback, which is presumably the right value for the primitive that is
executing the callback.  [Note that in the threaded VM the argumentCount
/and/ newMethod are remembered in per-thread data and restored when a
thread takes ownership of the VM; i.e. in the threaded VM it does the right
thing (tm) ].

Fixing this had several effects.  First the screen was a normal size, with
a black background except in the VGTigerDemo screen area, second, the
garbage pixels at the bottom of the screen disappeared, finally the crash
happens regularly on the third complete revolution of the tiger's head.
Here's the VM code (I'll commit soon):

StackInterpreter>>ownVM: threadIndexAndFlags
<api>
<inline: false>
"This is the entry-point for plugins and primitives that wish to reacquire
the VM after having
released it via disownVM or callbacks that want to acquire it without
knowing their ownership
status.  While this exists for the threaded FFI VM we use it to reset the
argumentCount after a callback.

Answer the argumentCount encoded as a SmallInteger if the current thread is
the VM thread.
Answer -1 if the current thread is unknown to the VM and fails to take
ownership."
| amInVMThread |
<var: 'amInVMThread' declareC: 'extern sqInt amInVMThread(void)'>
self cCode: [] inSmalltalk: [amInVMThread := 1. amInVMThread class].
self amInVMThread ifFalse:
[^-1].
self assert: primFailCode = 0.
^objectMemory integerObjectOf: argumentCount

StackInterpreter>>disownVM: flags
<api>
<inline: false>
"Release the VM to other threads and answer the current thread's index.

This is the entry-point for plugins and primitives that wish to release the
VM while
performing some operation that may potentially block, and for callbacks
returning
back to some blocking operation.  While this exists for the threaded FFI VM
we use
it to reset the argumentCount after a callback."
self assert: ((objectMemory isIntegerObject: flags)
and: [(objectMemory integerValueOf: flags)
between: 0
and: (self argumentCountOfMethodHeader: -1)]).
self assert: primFailCode = 0.
argumentCount := objectMemory integerValueOf: flags.
^0

and this is the code in thunkEntry (the callback entry-point) that calls
ownVM: and then disownVM:

long
thunkEntry(void *thunkp, sqIntptr_t *stackp)
{
    VMCallbackContext vmcc;
    VMCallbackContext *previousCallbackContext;
    int flags, returnType;
...
    if ((flags = interpreterProxy->ownVM(0)) < 0) {
        fprintf(stderr,"Warning; callback failed to own the VM\n");
        return -1;
    }

    if (!(returnType = setjmp(vmcc.trampoline))) {
        previousCallbackContext = getRMCC();
        setRMCC(&vmcc);
        vmcc.thunkp = thunkp;
        vmcc.stackp = stackp + 2; /* skip address of retpc & retpc (thunk)
*/
        vmcc.intregargsp = 0;
        vmcc.floatregargsp = 0;
        interpreterProxy->sendInvokeCallbackContext(&vmcc);
...
    }
    setRMCC(previousCallbackContext);
    interpreterProxy->disownVM(flags);

    switch (returnType) {

    case retword:   return vmcc.rvs.valword;
...

So while I have no fix for the bug,

a) I understand that there is a serious issue with callbacks not restoring
sufficient state on return from callback (ideally both argumentCOunt /and/
newMethod should be preserved iff a primitive invoking a callback can fail
after invoking a callback, but this is a /really/ bad idea; primitives
should only fail if they have no effects, so this isn't an important issue;
restoring argumentCOunt is essential and my fix does that).

b) the crash is now 100% repeatable and happens always on the third
rotation of the tiger's head, so I'm optimistic I can understand the bug

c) the code that creates the callbacks waste cycles installing showSurface
and unlockSurface callbacks that do nothing.  Instead
both createUnlockSurfaceFn and createShowSurfaceFn should simply answer 0.
These callbacks don't need to happen and callbacks are complex enough that
avoiding two of the three calklbacks involved in a bitblt could speed
things up.

More info as I have it.


> 2017-03-12 13:43 GMT+01:00 Nicolai Hess <nicolaihess at gmail.com>:
>
>>
>>
>>
>> 2017-03-12 13:36 GMT+01:00 Nicolas Cellier <nicolas.cellier.aka.nice at gmai
>> l.com>:
>>
>>>
>>> Couldn't it be that some Smalltalk memory has been relocated? (I'm
>>> thinking of the DisplayScreen bits)
>>>
>>
>> http://forum.world.st/Too-frequent-crashes-tp4927143p4927722.html
>>
>>
>>
>>
>>>
>>> 2017-03-12 12:53 GMT+01:00 Esteban Lorenzano <estebanlm at gmail.com>:
>>>
>>>>
>>>> Hi,
>>>>
>>>> this is lockSurfaceFn:
>>>>
>>>> createLockSurfaceFn
>>>> ^ FFICallback
>>>> signature: #(void * (void *handle, int *pitch, int x, int y, int w, int
>>>> h))
>>>> block: [ :handle :pitch :x :y :w :h |
>>>> pitch signedLongAt: 1 put: (self get_stride: handle).
>>>> self get_data: handle ]
>>>>
>>>> and
>>>>
>>>> createUnlockSurfaceFn
>>>> ^ FFICallback
>>>> signature: #(int (void *handle, int x, int y, int w, int h))
>>>> block: [ :handle :x :y :w :h | 0 "Do nothing” ]
>>>>
>>>> cheers!
>>>> Esteban
>>>>
>>>> On 12 Mar 2017, at 03:34, Eliot Miranda <eliot.miranda at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Esteban,
>>>>
>>>> On Fri, Mar 10, 2017 at 7:35 AM, Esteban Lorenzano <estebanlm at gmail.com
>>>> > wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I’m tumbling into an error in Pharo, because we use callbacks
>>>>> intensively, in Athens(cairo)-to-World conversion in particular, and people
>>>>> is sending always their crash reports… we made the whole conversion a lot
>>>>> more robust since problems started to arise, but now I hit a wall I cannot
>>>>> solve: I think problem is in something in callbacks.
>>>>>
>>>>
>>>> My original theory is wrong.  As you suspected it is something to do
>>>> with the callback in primitiveCopyBits via lockSurfaces & unlockSurfaces.
>>>> Can you tell me what the callback is and what code installs it into the
>>>> lockSurfaceFn and unlockSurfaceFn?
>>>>
>>>>
>>>>> And problem is showing very easy on 64bits (while in 32bits it takes
>>>>> time and is more random).
>>>>>
>>>>> Here is the easiest way to reproduce it (in mac):
>>>>>
>>>>> wget files.pharo.org/get-files/60/pharo64-mac-latest.zip
>>>>> wget files.pharo.org/get-files/60/pharo64.zip
>>>>> wget files.pharo.org/get-files/60/sources.zip
>>>>> unzip pharo64-mac-latest.zip
>>>>> unzip pharo64.zip
>>>>> unzip sources.zip
>>>>> ./Pharo.app/Contents/MacOS/Pharo ./Pharo64-60438.image eval
>>>>> "VGTigerDemo runDemo"
>>>>>
>>>>> eventually (like 5-6 seconds after, if not immediately), you will have
>>>>> a stack like this:
>>>>>
>>>>> SmallInteger(Object)>>primitiveFailed:
>>>>> SmallInteger(Object)>>primitiveFailed
>>>>> SmallInteger(VMCallbackContext64)>>primSignal:andReturnAs:fromContext:
>>>>> GrafPort>>copyBits
>>>>> GrafPort>>image:at:sourceRect:rule:
>>>>> FormCanvas>>image:at:sourceRect:rule:
>>>>> FormCanvas(Canvas)>>drawImage:at:sourceRect:
>>>>> FormCanvas(Canvas)>>drawImage:at:
>>>>> VGTigerDemo>>runDemo
>>>>> VGTigerDemo class>>runDemo
>>>>> UndefinedObject>>DoIt
>>>>> OpalCompiler>>evaluate
>>>>> OpalCompiler(AbstractCompiler)>>evaluate:
>>>>> [ result := Smalltalk compiler evaluate: aStream.
>>>>> self hasSessionChanged
>>>>>         ifFalse: [ self stdout
>>>>>                         print: result;
>>>>>                         lf ] ] in EvaluateCommandLineHandler>>evaluate:
>>>>> in Block: [ result := Smalltalk compiler evaluate: aStream....
>>>>> BlockClosure>>on:do:
>>>>> EvaluateCommandLineHandler>>evaluate:
>>>>> EvaluateCommandLineHandler>>evaluateArguments
>>>>> EvaluateCommandLineHandler>>activate
>>>>> EvaluateCommandLineHandler class(CommandLineHandler
>>>>> class)>>activateWith:
>>>>> [ aCommandLinehandler activateWith: commandLine ] in
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>>> in Block: [ aCommandLinehandler activateWith: commandLine ]
>>>>> BlockClosure>>on:do:
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activateSubCommand:
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleSubcommand
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>handleArgument:
>>>>> [ self
>>>>>         handleArgument:
>>>>>                 (self arguments
>>>>>                         ifEmpty: [ '' ]
>>>>>                         ifNotEmpty: [ :arguments | arguments first ])
>>>>> ] in PharoCommandLineHandler(BasicCommandLineHandler)>>activate in
>>>>> Block: [ self...
>>>>> BlockClosure>>on:do:
>>>>> PharoCommandLineHandler(BasicCommandLineHandler)>>activate
>>>>> PharoCommandLineHandler>>activate
>>>>> PharoCommandLineHandler class(CommandLineHandler class)>>activateWith:
>>>>> [ super activateWith: aCommandLine ] in PharoCommandLineHandler
>>>>> class>>activateWith: in Block: [ super activateWith: aCommandLine ]
>>>>>
>>>>> Any idea?
>>>>>
>>>>> thanks!
>>>>> Esteban
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> _,,,^..^,,,_
>>>> best, Eliot
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170312/aa79175e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Capture d’écran 2017-03-12 à 14.26.34.png
Type: image/png
Size: 140811 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20170312/aa79175e/attachment-0001.png>


More information about the Vm-dev mailing list