[Vm-dev] Reproducible VM crash on Win32 with callbacks

Guillermo Polito guillermopolito at gmail.com
Wed Jan 16 11:21:38 UTC 2019


Hi all,

I've tested using the volatile keyword in all variables in the function and
the problem is still there.
So while I get some time to look at the asm outputs I've issued a PR with
the discussed solution in here:
https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/353

@Nicolas, I was able to compile and debug win64 almost out of the box.
However, I could not make the visual studio code debugging extension work
for the 32bit version.
Where you compiling on clang with just the makefiles or you have used
visual studio for that?
Also, could you share your visual studio project files? I'd like to try
debugging with it :)

Tx all,
Guille


On Wed, Jan 16, 2019 at 10:18 AM Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
> And if it fails with new versions of gcc, it would also be intersting to
> test with clang.
> For Win64 I had to use clang...
>
> Le mer. 16 janv. 2019 à 09:59, Guillermo Polito <guillermopolito at gmail.com>
> a écrit :
>
>>
>> Hi Eliot!
>>
>> Thanks for the quick answer :)
>>
>> On Tue, Jan 15, 2019 at 6:04 PM Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>>
>>>
>>> Hi Guille, Hi Pablo,
>>>
>>>     I misspoke...
>>>
>>> On Tue, Jan 15, 2019 at 9:00 AM Eliot Miranda <eliot.miranda at gmail.com>
>>> wrote:
>>>
>>>> Hi Guille,
>>>>
>>>> On Tue, Jan 15, 2019 at 8:45 AM Guillermo Polito <
>>>> guillermopolito at gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> With Pablo we have been tracking a bug on win32 that produces a
>>>>> segmentation fault on callback return. We can reproduce it 100% certainly
>>>>> when running the Alien qsort example both in latest pharo and squeak
>>>>> versions.
>>>>>
>>>>> After some debugging, it would seem that the thunkEntry function is
>>>>> over-optimized in 32 bits, corrupting the (C) stack. This was particularly
>>>>> boring because compiling the VM in debug mode was taking the bug away
>>>>> :-). We have cornered the bug and checked that callbacks do work ok if we
>>>>> disable optimizations just for the thunkEntry function like this:
>>>>>
>>>>> long
>>>>> __attribute__((optimize("O0"))) thunkEntry(void *thunkp, sqIntptr_t
>>>>> *stackp)
>>>>>
>>>>
>>>>> The thing is that latest mingw which we use for compiling the windows
>>>>> VM even in travis, now comes with gcc 7.4.0 which has a lot more of
>>>>> optimizations than before. Just having O1 also produces the same error.
>>>>>
>>>>> We have tried disabling some particular optimizations like
>>>>> fno-combine-stack-adjustments but with no result so far.
>>>>>
>>>>> The strange thing is that other callbacks like the ones coming from
>>>>> libgit work ok.
>>>>>
>>>>> Has somebody taken a look into this too?
>>>>> How would you suggest that we move on with this?
>>>>>
>>>>
>>>> Before adding the pragma to the source also look at whether using the
>>>> volatile keyword on variables in thunkProcess fixes the issue; for example
>>>>
>>>>     volatile VMCallbackContext vmcc;
>>>>     volatile VMCallbackContext *previousCallbackContext;
>>>>     volatile int flags, returnType;
>>>>
>>>
>> Ok I'm trying this now. At least I can launch compilation and do
>> something else [https://xkcd.com/303/] while it compiles :)
>>
>>
>>>
>>>> .  The other thing to do is to generate the machine code for
>>>> thinkProcess with gcc 7.x and an older version that does not crash and
>>>> compare to try and find out what specific optimization is causing the crash.
>>>>
>>>
>>> I meant generate the assembly with -S, e.g. -O1 -S.
>>>
>>
>> Sure, this is what we understood ^^.
>>
>>
>>> You can also compare _O0 -S with -O1 -S for gcc 7.x, and generate -O1 -S
>>> with and without the volatile keyword and see what differences that makes.
>>>
>>
>> I'll do this this afternoon If I have some time...
>>
>>
>>> Finally, if you do find you have to use the pragma, please write the fix
>>>> as
>>>>
>>>> long __attribute__((optimize("O0")))
>>>> thunkEntry(void *thunkp, sqIntptr_t *stackp)
>>>>
>>>> to keep the definition starting on a new line, which helps when using
>>>> command-line tools to look for definitions outside of an ide.
>>>>
>>>
>> Sure! no problem!
>>
>> Tx Again!
>>
>>
>>>
>>>>
>>>>> From our side, we think that using a pragma to disable optimizations
>>>>> for thunkEntry in the case of win32 looks okeyish at least to make the bug
>>>>> go away.
>>>>>
>>>>
>>>> Yes, but I expect it is actually that the volatile keyword has not been
>>>> used (a mistake of mine).  Here's a relevant stack overflow answer:
>>>>
>>>> https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp
>>>> <https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp>
>>>>
>>>> And if volatile does fix the issue, please apply it to the other
>>>> thinkEntry implementations.
>>>>
>>>> Cheers,
>>>>> Guille & Pablo
>>>>>
>>>>
>>>> Cheers!
>>>>
>>>
>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>
>>
>> --
>>
>>
>>
>> Guille Polito
>>
>> Research Engineer
>>
>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>
>> CRIStAL - UMR 9189
>>
>> French National Center for Scientific Research - *http://www.cnrs.fr
>> <http://www.cnrs.fr>*
>>
>>
>> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>>
>> *Phone: *+33 06 52 70 66 13
>>
>

-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
<http://www.cnrs.fr>*


*Web:* *http://guillep.github.io* <http://guillep.github.io>

*Phone: *+33 06 52 70 66 13
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190116/e234dc68/attachment-0001.html>


More information about the Vm-dev mailing list