[Vm-dev] Reproducible VM crash on Win32 with callbacks

Eliot Miranda eliot.miranda at gmail.com
Tue Jan 15 17:03:55 UTC 2019


Hi Guille, Hi Pablo,

    I misspoke...

On Tue, Jan 15, 2019 at 9:00 AM Eliot Miranda <eliot.miranda at gmail.com>
wrote:

> Hi Guille,
>
> On Tue, Jan 15, 2019 at 8:45 AM Guillermo Polito <
> guillermopolito at gmail.com> wrote:
>
>>
>> Hi all,
>>
>> With Pablo we have been tracking a bug on win32 that produces a
>> segmentation fault on callback return. We can reproduce it 100% certainly
>> when running the Alien qsort example both in latest pharo and squeak
>> versions.
>>
>> After some debugging, it would seem that the thunkEntry function is
>> over-optimized in 32 bits, corrupting the (C) stack. This was particularly
>> boring because compiling the VM in debug mode was taking the bug away
>> :-). We have cornered the bug and checked that callbacks do work ok if we
>> disable optimizations just for the thunkEntry function like this:
>>
>> long
>> __attribute__((optimize("O0"))) thunkEntry(void *thunkp, sqIntptr_t
>> *stackp)
>>
>
>> The thing is that latest mingw which we use for compiling the windows VM
>> even in travis, now comes with gcc 7.4.0 which has a lot more of
>> optimizations than before. Just having O1 also produces the same error.
>>
>> We have tried disabling some particular optimizations like
>> fno-combine-stack-adjustments but with no result so far.
>>
>> The strange thing is that other callbacks like the ones coming from
>> libgit work ok.
>>
>> Has somebody taken a look into this too?
>> How would you suggest that we move on with this?
>>
>
> Before adding the pragma to the source also look at whether using the
> volatile keyword on variables in thunkProcess fixes the issue; for example
>
>     volatile VMCallbackContext vmcc;
>     volatile VMCallbackContext *previousCallbackContext;
>     volatile int flags, returnType;
>
> .  The other thing to do is to generate the machine code for thinkProcess
> with gcc 7.x and an older version that does not crash and compare to try
> and find out what specific optimization is causing the crash.
>

I meant generate the assembly with -S, e.g. -O1 -S.  You can also compare
_O0 -S with -O1 -S for gcc 7.x, and generate -O1 -S with and without the
volatile keyword and see what differences that makes.

Finally, if you do find you have to use the pragma, please write the fix as
>
> long __attribute__((optimize("O0")))
> thunkEntry(void *thunkp, sqIntptr_t *stackp)
>
> to keep the definition starting on a new line, which helps when using
> command-line tools to look for definitions outside of an ide.
>
>
>> From our side, we think that using a pragma to disable optimizations for
>> thunkEntry in the case of win32 looks okeyish at least to make the bug go
>> away.
>>
>
> Yes, but I expect it is actually that the volatile keyword has not been
> used (a mistake of mine).  Here's a relevant stack overflow answer:
>
> https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp
> <https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp>
>
> And if volatile does fix the issue, please apply it to the other
> thinkEntry implementations.
>
> Cheers,
>> Guille & Pablo
>>
>
> Cheers!
>

_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190115/d6ac3303/attachment-0001.html>


More information about the Vm-dev mailing list