[Vm-dev] Reproducible VM crash on Win32 with callbacks

Eliot Miranda eliot.miranda at gmail.com
Tue Jan 15 17:00:14 UTC 2019


Hi Guille,

On Tue, Jan 15, 2019 at 8:45 AM Guillermo Polito <guillermopolito at gmail.com>
wrote:

>
> Hi all,
>
> With Pablo we have been tracking a bug on win32 that produces a
> segmentation fault on callback return. We can reproduce it 100% certainly
> when running the Alien qsort example both in latest pharo and squeak
> versions.
>
> After some debugging, it would seem that the thunkEntry function is
> over-optimized in 32 bits, corrupting the (C) stack. This was particularly
> boring because compiling the VM in debug mode was taking the bug away
> :-). We have cornered the bug and checked that callbacks do work ok if we
> disable optimizations just for the thunkEntry function like this:
>
> long
> __attribute__((optimize("O0"))) thunkEntry(void *thunkp, sqIntptr_t
> *stackp)
>

> The thing is that latest mingw which we use for compiling the windows VM
> even in travis, now comes with gcc 7.4.0 which has a lot more of
> optimizations than before. Just having O1 also produces the same error.
>
> We have tried disabling some particular optimizations like
> fno-combine-stack-adjustments but with no result so far.
>
> The strange thing is that other callbacks like the ones coming from libgit
> work ok.
>
> Has somebody taken a look into this too?
> How would you suggest that we move on with this?
>

Before adding the pragma to the source also look at whether using the
volatile keyword on variables in thunkProcess fixes the issue; for example

    volatile VMCallbackContext vmcc;
    volatile VMCallbackContext *previousCallbackContext;
    volatile int flags, returnType;

.  The other thing to do is to generate the machine code for thinkProcess
with gcc 7.x and an older version that does not crash and compare to try
and find out what specific optimization is causing the crash.

Finally, if you do find you have to use the pragma, please write the fix as

long __attribute__((optimize("O0")))
thunkEntry(void *thunkp, sqIntptr_t *stackp)

to keep the definition starting on a new line, which helps when using
command-line tools to look for definitions outside of an ide.


> From our side, we think that using a pragma to disable optimizations for
> thunkEntry in the case of win32 looks okeyish at least to make the bug go
> away.
>

Yes, but I expect it is actually that the volatile keyword has not been
used (a mistake of mine).  Here's a relevant stack overflow answer:
https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp
<https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp>

And if volatile does fix the issue, please apply it to the other thinkEntry
implementations.

Cheers,
> Guille & Pablo
>

Cheers!
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190115/1e64a7c2/attachment.html>


More information about the Vm-dev mailing list