[Vm-dev] latest changes (no idea from when it started) are making FFI win32 crash (and FFI Callbacks are not reliable anymore either)

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Nov 29 20:22:11 UTC 2016


Thanks Ronie and Esteban.
This seems to be an alignment problem indeed.
What I see is that alignment is defined at least in 3 different places:
- platforms/Cross/vm/sqCogStackAlignment.h
- platforms/Cross/plugins/IA32ABI/ia32abicc.c
- src/plugins/SqueakFFIPrims/IA32FFIPlugin.c and friends...
That's just too many different opinions!!! We have to unify that rather
than adding a 4th opinion in a Makefile.

However, about ALLOCA_LIES_SO_USE_GETSP, I'm not so sure that "It is NOT
the case of mingw."
Last time I used gdb, it WAS still the case, alloca was STILL lying.
See
http://lists.squeakfoundation.org/pipermail/vm-dev/2016-August/022985.html

BUT:
-----
forcing 16 bytes alignment supersedes the alloca hack, making it not
strictly necessary anymore
see below in generated src/plgins/IA32FFIPlugin.c:

        allocation = alloca(((stackSize +
((calloutState->structReturnSize)))) + (cStackAlignment()));
        if (allocaLiesSoUseGetsp()) {
                allocation = getsp();
        }
        if ((cStackAlignment()) != 0) {
                allocation = ((char *) ((((((usqInt)allocation)) |
((cStackAlignment()) - 1)) - ((cStackAlignment()) - 1))));
        }
        (calloutState->argVector = allocation);

but we further do:

        if ((0 + (cStackAlignment())) > 0) {
                setsp((calloutState->argVector));
        }

So if ever the stack pointer is greater than alloca return value, but we
removed the ALLOCA_LIES hack,
the stack pointer is then set back to alloca returned value, avoiding the
stack pointer offset problem
It would be worth writing  a unit test case, and inquiring the reason why
it lies in gcc mailing list to be sure...

cheers

2016-11-29 18:14 GMT+01:00 Esteban Lorenzano <estebanlm at gmail.com>:

>
> hah!
> you know what is the sad part of this? I wrote that message… it was for
> the future me, but I forget to check our flags :P
> I lost 2.5 days then + 2 days now.
>
> this fixes the problem with Windows crashes (yay!) but not the problem
> with callbacks (booo!)… any idea in that area?
>
> cheers,
> Esteban
>
> On 29 Nov 2016, at 17:30, Ronie Salgado <roniesalg at gmail.com> wrote:
>
> The last week I was having this exactly same crash in the
> MinimalisticHeadless branch, with both MinGW and with Visual Studio. I
> managed to get the VM working with MinGW (not yet with MSVC) by using the
> following defines,which I copied from the old Pharo CMake scripts:
>
> -DSTACK_ALIGN_BYTES=16 -DALLOCA_LIES_SO_USE_GETSP=0
>
> In the pharo-vm, the CogFamilyWindowsConfig >> #commonCompilerFlags method
> starts with the following comment:
> commonCompilerFlags
>     "omit -ggdb2 to prevent generating debug info"
>     "Some flags explanation:
>
>     STACK_ALIGN_BYTES=16 is needed in mingw and FFI (and I suppose on
> other modules too).
>     DALLOCA_LIES_SO_USE_GETSP=0 Some compilers return the stack address+4
> on alloca function,
>     then FFI module needs to adjust that. It is NOT the case of mingw.
>     For more information see this thread: http://forum.world.st/There-ar
> e-something-fishy-with-FFI-plugin-td4584226.html
>     "
>
>
> 2016-11-29 9:32 GMT-03:00 Esteban Lorenzano <estebanlm at gmail.com>:
>
>>
>>
>> On 29 Nov 2016, at 13:04, Clément Bera <bera.clement at gmail.com> wrote:
>>
>> Hi,
>>
>> Can you confirm this bug happen only in Windows ?
>>
>>
>> yes, the crash is just in windows.
>> the callback problem is general (note that FFICallbackTests works fine,
>> but I think this is related to the fact that it never enters the 2nd
>> condition with the qsort function) .
>>
>>
>> Do you have version number (both VMMaker and git commit) of the last
>> version you have that was working ?
>>
>>
>> sadly, not… I tried to get the latest working version, but with the mess
>> I have to get the VM to build with opensmalltalk-vm, I couldn’t track it.
>> I suspect is related to the work on 64bits for windows, but I have no
>> proof of that :P
>>
>> Esteban
>>
>>
>> Thanks.
>>
>>
>> On Tue, Nov 29, 2016 at 11:54 AM, Esteban Lorenzano <estebanlm at gmail.com>
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> So, I’m building the PharoVM along with all his dependencies. For me,
>>> this is a major step because I can drop the old build process finally.
>>> Now, I’m having serious problems with FFI (that they were not present
>>> before), :
>>>
>>>
>>> 1. CRASH IN WINDOWS (32bits):
>>>
>>> In Win32, it crashes automatically when trying to access this funtion:
>>>
>>> getEnvSize: nameString
>>>         ^ self ffiCall: #( int GetEnvironmentVariableA ( String
>>> nameString, nil, 0 ) ) module: #Kernel32
>>>
>>>  (this works perfectly fine in older versions)
>>>
>>> 2. CALLBACKS FAILING:
>>>
>>> Callbacks have problems. The examples passes but they are very simple…
>>> as soon as I try to do something complicates (like unqlite bindings or
>>> libgit2 bindings, who use callbacks intensively), callbacks stops working.
>>> I traced the problem up to this method:
>>>
>>> StackInterpreter>>#returnAs:ThroughCallback:Context:
>>>
>>> returnAs: returnTypeOop ThroughCallback: vmCallbackContext Context:
>>> callbackMethodContext
>>>         "callbackMethodContext is an activation of
>>> invokeCallback:[stack:registers:jmpbuf:].
>>>          Its sender is the VM's state prior to the callback.
>>> Reestablish that state (via longjmp),
>>>          and mark callbackMethodContext as dead."
>>>         <export: true>
>>>         <var: #vmCallbackContext type: #'VMCallbackContext *'>
>>>         | calloutMethodContext theFP thePage |
>>>         <var: #theFP type: #'char *'>
>>>         <var: #thePage type: #'StackPage *'>
>>>         ((self isIntegerObject: returnTypeOop)
>>>          and: [self isLiveContext: callbackMethodContext]) ifFalse:
>>>                 [^false].
>>>         calloutMethodContext := self externalInstVar: SenderIndex
>>> ofContext: callbackMethodContext.
>>>         (self isLiveContext: calloutMethodContext) ifFalse:
>>>                 [^false].
>>>         "We're about to leave this stack page; must save the current
>>> frame's instructionPointer."
>>>         self push: instructionPointer.
>>>         self externalWriteBackHeadFramePointers.
>>>         "Mark callbackMethodContext as dead; the common case is that it
>>> is the current frame.
>>>          We go the extra mile for the debugger."
>>>         (self isSingleContext: callbackMethodContext)
>>>                 ifTrue: [self markContextAsDead: callbackMethodContext]
>>>                 ifFalse:
>>>                         [theFP := self frameOfMarriedContext:
>>> callbackMethodContext.
>>>                          framePointer = theFP "common case"
>>>                                 ifTrue:
>>>                                         [(self isBaseFrame: theFP)
>>>                                                 ifTrue: [stackPages
>>> freeStackPage: stackPage]
>>>                                                 ifFalse:
>>> "calloutMethodContext is immediately below on the same page.  Make it
>>> current."
>>>
>>> [instructionPointer := (self frameCallerSavedIP: framePointer)
>>> asUnsignedInteger.
>>>                                                          stackPointer :=
>>> framePointer + (self frameStackedReceiverOffset: framePointer) +
>>> objectMemory wordSize.
>>>                                                          framePointer :=
>>> self frameCallerFP: framePointer.
>>>                                                          self setMethod:
>>> (self frameMethodObject: framePointer).
>>>                                                          self
>>> restoreCStackStateForCallbackContext: vmCallbackContext.
>>>                                                          "N.B.
>>> siglongjmp is defines as _longjmp on non-win32 platforms.
>>>                                                           This matches
>>> the use of _setjmp in ia32abicc.c."
>>>                                                          self siglong:
>>> vmCallbackContext trampoline jmp: (self integerValueOf: returnTypeOop).
>>>                                                          ^true]]
>>>                                 ifFalse:
>>>                                         [self externalDivorceFrame:
>>> theFP andContext: callbackMethodContext.
>>>                                          self markContextAsDead:
>>> callbackMethodContext]].
>>>         "Make the calloutMethodContext the active frame.  The case where
>>> calloutMethodContext
>>>          is immediately below callbackMethodContext on the same page is
>>> handled above."
>>>         (self isStillMarriedContext: calloutMethodContext)
>>>                 ifTrue:
>>>                         [theFP := self frameOfMarriedContext:
>>> calloutMethodContext.
>>>                          thePage := stackPages stackPageFor: theFP.
>>>                          "findSPOf:on: points to the word beneath the
>>> instructionPointer, but
>>>                           there is no instructionPointer on the top
>>> frame of the current page."
>>>                          self assert: thePage ~= stackPage.
>>>                          stackPointer := (self findSPOf: theFP on:
>>> thePage) - objectMemory wordSize.
>>>                          framePointer := theFP]
>>>                 ifFalse:
>>>                         [thePage := self makeBaseFrameFor:
>>> calloutMethodContext.
>>>                          framePointer := thePage headFP.
>>>                          stackPointer := thePage headSP].
>>>         instructionPointer := self popStack.
>>>         self setMethod: (objectMemory fetchPointer: MethodIndex
>>> ofObject: calloutMethodContext).
>>>         self setStackPageAndLimit: thePage.
>>>         self restoreCStackStateForCallbackContext: vmCallbackContext.
>>>          "N.B. siglongjmp is defines as _longjmp on non-win32 platforms.
>>>           This matches the use of _setjmp in ia32abicc.c."
>>>         self siglong: vmCallbackContext trampoline jmp: (self
>>> integerValueOf: returnTypeOop).
>>>         "NOTREACHED"
>>>         ^true
>>>
>>> with the first siglongjmp callbacks are passing fine.
>>> with the last (it would be if  framePointer = theFP AND !(isBaseFrame:
>>> theFP) ) it doesn’t.
>>>
>>> So… from here I’m a bit lost… I need some help :)
>>>
>>> thanks,
>>> Esteban
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20161129/b302e96b/attachment-0001.html>


More information about the Vm-dev mailing list