Thanks Ronie and Esteban.
This seems to be an alignment problem indeed.
What I see is that alignment is defined at least in 3 different places:
- platforms/Cross/vm/sqCogStackAlignment.h
- platforms/Cross/plugins/IA32ABI/ia32abicc.c
- src/plugins/SqueakFFIPrims/IA32FFIPlugin.c and friends...
That's just too many different opinions!!! We have to unify that rather than adding a 4th opinion in a Makefile.

However, about ALLOCA_LIES_SO_USE_GETSP, I'm not so sure that "It is NOT the case of mingw."
Last time I used gdb, it WAS still the case, alloca was STILL lying.
See http://lists.squeakfoundation.org/pipermail/vm-dev/2016-August/022985.html

BUT:
-----
forcing 16 bytes alignment supersedes the alloca hack, making it not strictly necessary anymore
see below in generated src/plgins/IA32FFIPlugin.c:

        allocation = alloca(((stackSize + ((calloutState->structReturnSize)))) + (cStackAlignment()));
        if (allocaLiesSoUseGetsp()) {
                allocation = getsp();
        }
        if ((cStackAlignment()) != 0) {
                allocation = ((char *) ((((((usqInt)allocation)) | ((cStackAlignment()) - 1)) - ((cStackAlignment()) - 1))));
        }
        (calloutState->argVector = allocation);

but we further do:

        if ((0 + (cStackAlignment())) > 0) {
                setsp((calloutState->argVector));
        }

So if ever the stack pointer is greater than alloca return value, but we removed the ALLOCA_LIES hack,
the stack pointer is then set back to alloca returned value, avoiding the stack pointer offset problem
It would be worth writing  a unit test case, and inquiring the reason why it lies in gcc mailing list to be sure...

cheers

2016-11-29 18:14 GMT+01:00 Esteban Lorenzano <estebanlm@gmail.com>:
 
hah! 
you know what is the sad part of this? I wrote that message… it was for the future me, but I forget to check our flags :P
I lost 2.5 days then + 2 days now. 

this fixes the problem with Windows crashes (yay!) but not the problem with callbacks (booo!)… any idea in that area?

cheers, 
Esteban

On 29 Nov 2016, at 17:30, Ronie Salgado <roniesalg@gmail.com> wrote:

The last week I was having this exactly same crash in the MinimalisticHeadless branch, with both MinGW and with Visual Studio. I managed to get the VM working with MinGW (not yet with MSVC) by using the following defines,which I copied from the old Pharo CMake scripts:

-DSTACK_ALIGN_BYTES=16 -DALLOCA_LIES_SO_USE_GETSP=0

In the pharo-vm, the CogFamilyWindowsConfig >> #commonCompilerFlags method starts with the following comment:
commonCompilerFlags
    "omit -ggdb2 to prevent generating debug info"
    "Some flags explanation:
   
    STACK_ALIGN_BYTES=16 is needed in mingw and FFI (and I suppose on other modules too).
    DALLOCA_LIES_SO_USE_GETSP=0 Some compilers return the stack address+4 on alloca function,
    then FFI module needs to adjust that. It is NOT the case of mingw.
    For more information see this thread: http://forum.world.st/There-are-something-fishy-with-FFI-plugin-td4584226.html
    "


2016-11-29 9:32 GMT-03:00 Esteban Lorenzano <estebanlm@gmail.com>:
 

On 29 Nov 2016, at 13:04, Clément Bera <bera.clement@gmail.com> wrote:

Hi,

Can you confirm this bug happen only in Windows ?

yes, the crash is just in windows.
the callback problem is general (note that FFICallbackTests works fine, but I think this is related to the fact that it never enters the 2nd condition with the qsort function) .


Do you have version number (both VMMaker and git commit) of the last version you have that was working ?

sadly, not… I tried to get the latest working version, but with the mess I have to get the VM to build with opensmalltalk-vm, I couldn’t track it. 
I suspect is related to the work on 64bits for windows, but I have no proof of that :P

Esteban


Thanks.


On Tue, Nov 29, 2016 at 11:54 AM, Esteban Lorenzano <estebanlm@gmail.com> wrote:

Hi,

So, I’m building the PharoVM along with all his dependencies. For me, this is a major step because I can drop the old build process finally.
Now, I’m having serious problems with FFI (that they were not present before), :


1. CRASH IN WINDOWS (32bits):

In Win32, it crashes automatically when trying to access this funtion:

getEnvSize: nameString
        ^ self ffiCall: #( int GetEnvironmentVariableA ( String nameString, nil, 0 ) ) module: #Kernel32

 (this works perfectly fine in older versions)

2. CALLBACKS FAILING:

Callbacks have problems. The examples passes but they are very simple… as soon as I try to do something complicates (like unqlite bindings or libgit2 bindings, who use callbacks intensively), callbacks stops working.
I traced the problem up to this method:

StackInterpreter>>#returnAs:ThroughCallback:Context:

returnAs: returnTypeOop ThroughCallback: vmCallbackContext Context: callbackMethodContext
        "callbackMethodContext is an activation of invokeCallback:[stack:registers:jmpbuf:].
         Its sender is the VM's state prior to the callback.  Reestablish that state (via longjmp),
         and mark callbackMethodContext as dead."
        <export: true>
        <var: #vmCallbackContext type: #'VMCallbackContext *'>
        | calloutMethodContext theFP thePage |
        <var: #theFP type: #'char *'>
        <var: #thePage type: #'StackPage *'>
        ((self isIntegerObject: returnTypeOop)
         and: [self isLiveContext: callbackMethodContext]) ifFalse:
                [^false].
        calloutMethodContext := self externalInstVar: SenderIndex ofContext: callbackMethodContext.
        (self isLiveContext: calloutMethodContext) ifFalse:
                [^false].
        "We're about to leave this stack page; must save the current frame's instructionPointer."
        self push: instructionPointer.
        self externalWriteBackHeadFramePointers.
        "Mark callbackMethodContext as dead; the common case is that it is the current frame.
         We go the extra mile for the debugger."
        (self isSingleContext: callbackMethodContext)
                ifTrue: [self markContextAsDead: callbackMethodContext]
                ifFalse:
                        [theFP := self frameOfMarriedContext: callbackMethodContext.
                         framePointer = theFP "common case"
                                ifTrue:
                                        [(self isBaseFrame: theFP)
                                                ifTrue: [stackPages freeStackPage: stackPage]
                                                ifFalse: "calloutMethodContext is immediately below on the same page.  Make it current."
                                                        [instructionPointer := (self frameCallerSavedIP: framePointer) asUnsignedInteger.
                                                         stackPointer := framePointer + (self frameStackedReceiverOffset: framePointer) + objectMemory wordSize.
                                                         framePointer := self frameCallerFP: framePointer.
                                                         self setMethod: (self frameMethodObject: framePointer).
                                                         self restoreCStackStateForCallbackContext: vmCallbackContext.
                                                         "N.B. siglongjmp is defines as _longjmp on non-win32 platforms.
                                                          This matches the use of _setjmp in ia32abicc.c."
                                                         self siglong: vmCallbackContext trampoline jmp: (self integerValueOf: returnTypeOop).
                                                         ^true]]
                                ifFalse:
                                        [self externalDivorceFrame: theFP andContext: callbackMethodContext.
                                         self markContextAsDead: callbackMethodContext]].
        "Make the calloutMethodContext the active frame.  The case where calloutMethodContext
         is immediately below callbackMethodContext on the same page is handled above."
        (self isStillMarriedContext: calloutMethodContext)
                ifTrue:
                        [theFP := self frameOfMarriedContext: calloutMethodContext.
                         thePage := stackPages stackPageFor: theFP.
                         "findSPOf:on: points to the word beneath the instructionPointer, but
                          there is no instructionPointer on the top frame of the current page."
                         self assert: thePage ~= stackPage.
                         stackPointer := (self findSPOf: theFP on: thePage) - objectMemory wordSize.
                         framePointer := theFP]
                ifFalse:
                        [thePage := self makeBaseFrameFor: calloutMethodContext.
                         framePointer := thePage headFP.
                         stackPointer := thePage headSP].
        instructionPointer := self popStack.
        self setMethod: (objectMemory fetchPointer: MethodIndex ofObject: calloutMethodContext).
        self setStackPageAndLimit: thePage.
        self restoreCStackStateForCallbackContext: vmCallbackContext.
         "N.B. siglongjmp is defines as _longjmp on non-win32 platforms.
          This matches the use of _setjmp in ia32abicc.c."
        self siglong: vmCallbackContext trampoline jmp: (self integerValueOf: returnTypeOop).
        "NOTREACHED"
        ^true

with the first siglongjmp callbacks are passing fine.
with the last (it would be if  framePointer = theFP AND !(isBaseFrame: theFP) ) it doesn’t.

So… from here I’m a bit lost… I need some help :)

thanks,
Esteban