[Vm-dev] latest changes (no idea from when it started) are making FFI win32 crash (and FFI Callbacks are not reliable anymore either)

Andres Valloud avalloud at smalltalk.comcastbiz.net
Wed Nov 30 02:07:55 UTC 2016


To prove alloca() *lies*, one needs to show e.g. a 5-10 C program 
independent of anything else exemplifying a clear specification 
violation.  Otherwise, how do you know the LIARS_LIARS_PANTS_ON_FIRE 
macros are not compensating for undefined behavior elsewhere?

On 11/29/16 16:23 , Nicolas Cellier wrote:
>
>
>
>
> Though, it's necessary to define ALLOCA_LIES_SO_USE_GETSP to zero to
> make FFI work with gcc.
> That does not mean that alloca does not lie, just that there is another
> problem with stack management...
>
> 2016-11-29 21:22 GMT+01:00 Nicolas Cellier
> <nicolas.cellier.aka.nice at gmail.com
> <mailto:nicolas.cellier.aka.nice at gmail.com>>:
>
>     Thanks Ronie and Esteban.
>     This seems to be an alignment problem indeed.
>     What I see is that alignment is defined at least in 3 different places:
>     - platforms/Cross/vm/sqCogStackAlignment.h
>     - platforms/Cross/plugins/IA32ABI/ia32abicc.c
>     - src/plugins/SqueakFFIPrims/IA32FFIPlugin.c and friends...
>     That's just too many different opinions!!! We have to unify that
>     rather than adding a 4th opinion in a Makefile.
>
>     However, about ALLOCA_LIES_SO_USE_GETSP, I'm not so sure that "It is
>     NOT the case of mingw."
>     Last time I used gdb, it WAS still the case, alloca was STILL lying.
>     See
>     http://lists.squeakfoundation.org/pipermail/vm-dev/2016-August/022985.html
>     <http://lists.squeakfoundation.org/pipermail/vm-dev/2016-August/022985.html>
>
>     BUT:
>     -----
>     forcing 16 bytes alignment supersedes the alloca hack, making it not
>     strictly necessary anymore
>     see below in generated src/plgins/IA32FFIPlugin.c:
>
>             allocation = alloca(((stackSize +
>     ((calloutState->structReturnSize)))) + (cStackAlignment()));
>             if (allocaLiesSoUseGetsp()) {
>                     allocation = getsp();
>             }
>             if ((cStackAlignment()) != 0) {
>                     allocation = ((char *) ((((((usqInt)allocation)) |
>     ((cStackAlignment()) - 1)) - ((cStackAlignment()) - 1))));
>             }
>             (calloutState->argVector = allocation);
>
>     but we further do:
>
>             if ((0 + (cStackAlignment())) > 0) {
>                     setsp((calloutState->argVector));
>             }
>
>     So if ever the stack pointer is greater than alloca return value,
>     but we removed the ALLOCA_LIES hack,
>     the stack pointer is then set back to alloca returned value,
>     avoiding the stack pointer offset problem
>     It would be worth writing  a unit test case, and inquiring the
>     reason why it lies in gcc mailing list to be sure...
>
>     cheers
>
>     2016-11-29 18:14 GMT+01:00 Esteban Lorenzano <estebanlm at gmail.com
>     <mailto:estebanlm at gmail.com>>:
>
>
>         hah!
>         you know what is the sad part of this? I wrote that message… it
>         was for the future me, but I forget to check our flags :P
>         I lost 2.5 days then + 2 days now.
>
>         this fixes the problem with Windows crashes (yay!) but not the
>         problem with callbacks (booo!)… any idea in that area?
>
>         cheers,
>         Esteban
>
>>         On 29 Nov 2016, at 17:30, Ronie Salgado <roniesalg at gmail.com
>>         <mailto:roniesalg at gmail.com>> wrote:
>>
>>         The last week I was having this exactly same crash in the
>>         MinimalisticHeadless branch, with both MinGW and with Visual
>>         Studio. I managed to get the VM working with MinGW (not yet
>>         with MSVC) by using the following defines,which I copied from
>>         the old Pharo CMake scripts:
>>
>>         -DSTACK_ALIGN_BYTES=16 -DALLOCA_LIES_SO_USE_GETSP=0
>>
>>         In the pharo-vm, the CogFamilyWindowsConfig >>
>>         #commonCompilerFlags method starts with the following comment:
>>         commonCompilerFlags
>>             "omit -ggdb2 to prevent generating debug info"
>>             "Some flags explanation:
>>
>>             STACK_ALIGN_BYTES=16 is needed in mingw and FFI (and I
>>         suppose on other modules too).
>>             DALLOCA_LIES_SO_USE_GETSP=0 Some compilers return the
>>         stack address+4 on alloca function,
>>             then FFI module needs to adjust that. It is NOT the case
>>         of mingw.
>>             For more information see this thread:
>>         http://forum.world.st/There-are-something-fishy-with-FFI-plugin-td4584226.html
>>         <http://forum.world.st/There-are-something-fishy-with-FFI-plugin-td4584226.html>
>>             "
>>
>>
>>         2016-11-29 9:32 GMT-03:00 Esteban Lorenzano
>>         <estebanlm at gmail.com <mailto:estebanlm at gmail.com>>:
>>
>>
>>
>>>             On 29 Nov 2016, at 13:04, Clément Bera
>>>             <bera.clement at gmail.com <mailto:bera.clement at gmail.com>>
>>>             wrote:
>>>
>>>             Hi,
>>>
>>>             Can you confirm this bug happen only in Windows ?
>>
>>             yes, the crash is just in windows.
>>             the callback problem is general (note that
>>             FFICallbackTests works fine, but I think this is related
>>             to the fact that it never enters the 2nd condition with
>>             the qsort function) .
>>
>>>
>>>             Do you have version number (both VMMaker and git commit)
>>>             of the last version you have that was working ?
>>
>>             sadly, not… I tried to get the latest working version, but
>>             with the mess I have to get the VM to build with
>>             opensmalltalk-vm, I couldn’t track it.
>>             I suspect is related to the work on 64bits for windows,
>>             but I have no proof of that :P
>>
>>             Esteban
>>
>>>
>>>             Thanks.
>>>
>>>
>>>             On Tue, Nov 29, 2016 at 11:54 AM, Esteban Lorenzano
>>>             <estebanlm at gmail.com <mailto:estebanlm at gmail.com>> wrote:
>>>
>>>
>>>                 Hi,
>>>
>>>                 So, I’m building the PharoVM along with all his
>>>                 dependencies. For me, this is a major step because I
>>>                 can drop the old build process finally.
>>>                 Now, I’m having serious problems with FFI (that they
>>>                 were not present before), :
>>>
>>>
>>>                 1. CRASH IN WINDOWS (32bits):
>>>
>>>                 In Win32, it crashes automatically when trying to
>>>                 access this funtion:
>>>
>>>                 getEnvSize: nameString
>>>                         ^ self ffiCall: #( int
>>>                 GetEnvironmentVariableA ( String nameString, nil, 0 )
>>>                 ) module: #Kernel32
>>>
>>>                  (this works perfectly fine in older versions)
>>>
>>>                 2. CALLBACKS FAILING:
>>>
>>>                 Callbacks have problems. The examples passes but they
>>>                 are very simple… as soon as I try to do something
>>>                 complicates (like unqlite bindings or libgit2
>>>                 bindings, who use callbacks intensively), callbacks
>>>                 stops working.
>>>                 I traced the problem up to this method:
>>>
>>>                 StackInterpreter>>#returnAs:ThroughCallback:Context:
>>>
>>>                 returnAs: returnTypeOop ThroughCallback:
>>>                 vmCallbackContext Context: callbackMethodContext
>>>                         "callbackMethodContext is an activation of
>>>                 invokeCallback:[stack:registers:jmpbuf:].
>>>                          Its sender is the VM's state prior to the
>>>                 callback.  Reestablish that state (via longjmp),
>>>                          and mark callbackMethodContext as dead."
>>>                         <export: true>
>>>                         <var: #vmCallbackContext type:
>>>                 #'VMCallbackContext *'>
>>>                         | calloutMethodContext theFP thePage |
>>>                         <var: #theFP type: #'char *'>
>>>                         <var: #thePage type: #'StackPage *'>
>>>                         ((self isIntegerObject: returnTypeOop)
>>>                          and: [self isLiveContext:
>>>                 callbackMethodContext]) ifFalse:
>>>                                 [^false].
>>>                         calloutMethodContext := self externalInstVar:
>>>                 SenderIndex ofContext: callbackMethodContext.
>>>                         (self isLiveContext: calloutMethodContext)
>>>                 ifFalse:
>>>                                 [^false].
>>>                         "We're about to leave this stack page; must
>>>                 save the current frame's instructionPointer."
>>>                         self push: instructionPointer.
>>>                         self externalWriteBackHeadFramePointers.
>>>                         "Mark callbackMethodContext as dead; the
>>>                 common case is that it is the current frame.
>>>                          We go the extra mile for the debugger."
>>>                         (self isSingleContext: callbackMethodContext)
>>>                                 ifTrue: [self markContextAsDead:
>>>                 callbackMethodContext]
>>>                                 ifFalse:
>>>                                         [theFP := self
>>>                 frameOfMarriedContext: callbackMethodContext.
>>>                                          framePointer = theFP "common
>>>                 case"
>>>                                                 ifTrue:
>>>                                                         [(self
>>>                 isBaseFrame: theFP)
>>>
>>>                 ifTrue: [stackPages freeStackPage: stackPage]
>>>
>>>                 ifFalse: "calloutMethodContext is immediately below
>>>                 on the same page.  Make it current."
>>>
>>>                   [instructionPointer := (self frameCallerSavedIP:
>>>                 framePointer) asUnsignedInteger.
>>>
>>>                    stackPointer := framePointer + (self
>>>                 frameStackedReceiverOffset: framePointer) +
>>>                 objectMemory wordSize.
>>>
>>>                    framePointer := self frameCallerFP: framePointer.
>>>
>>>                    self setMethod: (self frameMethodObject:
>>>                 framePointer).
>>>
>>>                    self restoreCStackStateForCallbackContext:
>>>                 vmCallbackContext.
>>>
>>>                    "N.B. siglongjmp is defines as _longjmp on
>>>                 non-win32 platforms.
>>>
>>>                     This matches the use of _setjmp in ia32abicc.c."
>>>
>>>                    self siglong: vmCallbackContext trampoline jmp:
>>>                 (self integerValueOf: returnTypeOop).
>>>
>>>                    ^true]]
>>>                                                 ifFalse:
>>>                                                         [self
>>>                 externalDivorceFrame: theFP andContext:
>>>                 callbackMethodContext.
>>>                                                          self
>>>                 markContextAsDead: callbackMethodContext]].
>>>                         "Make the calloutMethodContext the active
>>>                 frame.  The case where calloutMethodContext
>>>                          is immediately below callbackMethodContext
>>>                 on the same page is handled above."
>>>                         (self isStillMarriedContext:
>>>                 calloutMethodContext)
>>>                                 ifTrue:
>>>                                         [theFP := self
>>>                 frameOfMarriedContext: calloutMethodContext.
>>>                                          thePage := stackPages
>>>                 stackPageFor: theFP.
>>>                                          "findSPOf:on: points to the
>>>                 word beneath the instructionPointer, but
>>>                                           there is no
>>>                 instructionPointer on the top frame of the current page."
>>>                                          self assert: thePage ~=
>>>                 stackPage.
>>>                                          stackPointer := (self
>>>                 findSPOf: theFP on: thePage) - objectMemory wordSize.
>>>                                          framePointer := theFP]
>>>                                 ifFalse:
>>>                                         [thePage := self
>>>                 makeBaseFrameFor: calloutMethodContext.
>>>                                          framePointer := thePage headFP.
>>>                                          stackPointer := thePage headSP].
>>>                         instructionPointer := self popStack.
>>>                         self setMethod: (objectMemory fetchPointer:
>>>                 MethodIndex ofObject: calloutMethodContext).
>>>                         self setStackPageAndLimit: thePage.
>>>                         self restoreCStackStateForCallbackContext:
>>>                 vmCallbackContext.
>>>                          "N.B. siglongjmp is defines as _longjmp on
>>>                 non-win32 platforms.
>>>                           This matches the use of _setjmp in
>>>                 ia32abicc.c."
>>>                         self siglong: vmCallbackContext trampoline
>>>                 jmp: (self integerValueOf: returnTypeOop).
>>>                         "NOTREACHED"
>>>                         ^true
>>>
>>>                 with the first siglongjmp callbacks are passing fine.
>>>                 with the last (it would be if  framePointer = theFP
>>>                 AND !(isBaseFrame: theFP) ) it doesn’t.
>>>
>>>                 So… from here I’m a bit lost… I need some help :)
>>>
>>>                 thanks,
>>>                 Esteban
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
>


More information about the Vm-dev mailing list