[Vm-dev] Reproducible VM crash on Win32 with callbacks

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Wed Jan 16 09:18:03 UTC 2019


And if it fails with new versions of gcc, it would also be intersting to
test with clang.
For Win64 I had to use clang...

Le mer. 16 janv. 2019 à 09:59, Guillermo Polito <guillermopolito at gmail.com>
a écrit :

>
> Hi Eliot!
>
> Thanks for the quick answer :)
>
> On Tue, Jan 15, 2019 at 6:04 PM Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>>
>> Hi Guille, Hi Pablo,
>>
>>     I misspoke...
>>
>> On Tue, Jan 15, 2019 at 9:00 AM Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>>
>>> Hi Guille,
>>>
>>> On Tue, Jan 15, 2019 at 8:45 AM Guillermo Polito <
>>> guillermopolito at gmail.com> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> With Pablo we have been tracking a bug on win32 that produces a
>>>> segmentation fault on callback return. We can reproduce it 100% certainly
>>>> when running the Alien qsort example both in latest pharo and squeak
>>>> versions.
>>>>
>>>> After some debugging, it would seem that the thunkEntry function is
>>>> over-optimized in 32 bits, corrupting the (C) stack. This was particularly
>>>> boring because compiling the VM in debug mode was taking the bug away
>>>> :-). We have cornered the bug and checked that callbacks do work ok if we
>>>> disable optimizations just for the thunkEntry function like this:
>>>>
>>>> long
>>>> __attribute__((optimize("O0"))) thunkEntry(void *thunkp, sqIntptr_t
>>>> *stackp)
>>>>
>>>
>>>> The thing is that latest mingw which we use for compiling the windows
>>>> VM even in travis, now comes with gcc 7.4.0 which has a lot more of
>>>> optimizations than before. Just having O1 also produces the same error.
>>>>
>>>> We have tried disabling some particular optimizations like
>>>> fno-combine-stack-adjustments but with no result so far.
>>>>
>>>> The strange thing is that other callbacks like the ones coming from
>>>> libgit work ok.
>>>>
>>>> Has somebody taken a look into this too?
>>>> How would you suggest that we move on with this?
>>>>
>>>
>>> Before adding the pragma to the source also look at whether using the
>>> volatile keyword on variables in thunkProcess fixes the issue; for example
>>>
>>>     volatile VMCallbackContext vmcc;
>>>     volatile VMCallbackContext *previousCallbackContext;
>>>     volatile int flags, returnType;
>>>
>>
> Ok I'm trying this now. At least I can launch compilation and do something
> else [https://xkcd.com/303/] while it compiles :)
>
>
>>
>>> .  The other thing to do is to generate the machine code for
>>> thinkProcess with gcc 7.x and an older version that does not crash and
>>> compare to try and find out what specific optimization is causing the crash.
>>>
>>
>> I meant generate the assembly with -S, e.g. -O1 -S.
>>
>
> Sure, this is what we understood ^^.
>
>
>> You can also compare _O0 -S with -O1 -S for gcc 7.x, and generate -O1 -S
>> with and without the volatile keyword and see what differences that makes.
>>
>
> I'll do this this afternoon If I have some time...
>
>
>> Finally, if you do find you have to use the pragma, please write the fix
>>> as
>>>
>>> long __attribute__((optimize("O0")))
>>> thunkEntry(void *thunkp, sqIntptr_t *stackp)
>>>
>>> to keep the definition starting on a new line, which helps when using
>>> command-line tools to look for definitions outside of an ide.
>>>
>>
> Sure! no problem!
>
> Tx Again!
>
>
>>
>>>
>>>> From our side, we think that using a pragma to disable optimizations
>>>> for thunkEntry in the case of win32 looks okeyish at least to make the bug
>>>> go away.
>>>>
>>>
>>> Yes, but I expect it is actually that the volatile keyword has not been
>>> used (a mistake of mine).  Here's a relevant stack overflow answer:
>>>
>>> https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp
>>> <https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp>
>>>
>>> And if volatile does fix the issue, please apply it to the other
>>> thinkEntry implementations.
>>>
>>> Cheers,
>>>> Guille & Pablo
>>>>
>>>
>>> Cheers!
>>>
>>
>> _,,,^..^,,,_
>> best, Eliot
>>
>
>
> --
>
>
>
> Guille Polito
>
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
>
> CRIStAL - UMR 9189
>
> French National Center for Scientific Research - *http://www.cnrs.fr
> <http://www.cnrs.fr>*
>
>
> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>
> *Phone: *+33 06 52 70 66 13
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190116/00101fde/attachment.html>


More information about the Vm-dev mailing list