[Vm-dev] Reproducible VM crash on Win32 with callbacks

Guillermo Polito guillermopolito at gmail.com
Fri Jan 18 13:57:25 UTC 2019


Hi Eliot,

On Wed, Jan 16, 2019 at 4:53 PM Eliot Miranda <eliot.miranda at gmail.com>
wrote:

>
> Hi Guille,
>
>
> On Jan 16, 2019, at 3:21 AM, Guillermo Polito <guillermopolito at gmail.com>
> wrote:
>
> Hi all,
>
> I've tested using the volatile keyword in all variables in the function
> and the problem is still there.
>
>
> Then there is a compiler bug with GCC 7.4, and we should report it.  I’ll
> try and take a look.  How do you produce a build configuration that
> includes GCC 7.4?  I’ll need to do so to take a look.
>

Hi Eliot,

Well, I've just setup a normal environment on windows 10 with cygwin,
installed mingw-gcc through cygwin, exactly the same as the CI does.
Once I've setup my environment, I just go to the build*/pharo.cog.spur
directory and ./mvm -f
I've not modified any makefile whatsoever, I'm 100% sure I'm using the
default values.

Also, I'm running that from a (oh-my-)zsh bash installed on top of cygwin,
but I don't think that makes a difference, it's just for my convenience
:)...


>
> So while I get some time to look at the asm outputs I've issued a PR with
> the discussed solution in here:
> https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/353
>
> @Nicolas, I was able to compile and debug win64 almost out of the box.
> However, I could not make the visual studio code debugging extension work
> for the 32bit version.
> Where you compiling on clang with just the makefiles or you have used
> visual studio for that?
> Also, could you share your visual studio project files? I'd like to try
> debugging with it :)
>
> Tx all,
> Guille
>
>
> On Wed, Jan 16, 2019 at 10:18 AM Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> wrote:
>
>>
>> And if it fails with new versions of gcc, it would also be intersting to
>> test with clang.
>> For Win64 I had to use clang...
>>
>> Le mer. 16 janv. 2019 à 09:59, Guillermo Polito <
>> guillermopolito at gmail.com> a écrit :
>>
>>>
>>> Hi Eliot!
>>>
>>> Thanks for the quick answer :)
>>>
>>> On Tue, Jan 15, 2019 at 6:04 PM Eliot Miranda <eliot.miranda at gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi Guille, Hi Pablo,
>>>>
>>>>     I misspoke...
>>>>
>>>> On Tue, Jan 15, 2019 at 9:00 AM Eliot Miranda <eliot.miranda at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Guille,
>>>>>
>>>>> On Tue, Jan 15, 2019 at 8:45 AM Guillermo Polito <
>>>>> guillermopolito at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> With Pablo we have been tracking a bug on win32 that produces a
>>>>>> segmentation fault on callback return. We can reproduce it 100% certainly
>>>>>> when running the Alien qsort example both in latest pharo and squeak
>>>>>> versions.
>>>>>>
>>>>>> After some debugging, it would seem that the thunkEntry function is
>>>>>> over-optimized in 32 bits, corrupting the (C) stack. This was particularly
>>>>>> boring because compiling the VM in debug mode was taking the bug away
>>>>>> :-). We have cornered the bug and checked that callbacks do work ok if we
>>>>>> disable optimizations just for the thunkEntry function like this:
>>>>>>
>>>>>> long
>>>>>> __attribute__((optimize("O0"))) thunkEntry(void *thunkp, sqIntptr_t
>>>>>> *stackp)
>>>>>>
>>>>>
>>>>>> The thing is that latest mingw which we use for compiling the windows
>>>>>> VM even in travis, now comes with gcc 7.4.0 which has a lot more of
>>>>>> optimizations than before. Just having O1 also produces the same error.
>>>>>>
>>>>>> We have tried disabling some particular optimizations like
>>>>>> fno-combine-stack-adjustments but with no result so far.
>>>>>>
>>>>>> The strange thing is that other callbacks like the ones coming from
>>>>>> libgit work ok.
>>>>>>
>>>>>> Has somebody taken a look into this too?
>>>>>> How would you suggest that we move on with this?
>>>>>>
>>>>>
>>>>> Before adding the pragma to the source also look at whether using the
>>>>> volatile keyword on variables in thunkProcess fixes the issue; for example
>>>>>
>>>>>     volatile VMCallbackContext vmcc;
>>>>>     volatile VMCallbackContext *previousCallbackContext;
>>>>>     volatile int flags, returnType;
>>>>>
>>>>
>>> Ok I'm trying this now. At least I can launch compilation and do
>>> something else [https://xkcd.com/303/] while it compiles :)
>>>
>>>
>>>>
>>>>> .  The other thing to do is to generate the machine code for
>>>>> thinkProcess with gcc 7.x and an older version that does not crash and
>>>>> compare to try and find out what specific optimization is causing the crash.
>>>>>
>>>>
>>>> I meant generate the assembly with -S, e.g. -O1 -S.
>>>>
>>>
>>> Sure, this is what we understood ^^.
>>>
>>>
>>>> You can also compare _O0 -S with -O1 -S for gcc 7.x, and generate -O1
>>>> -S with and without the volatile keyword and see what differences that
>>>> makes.
>>>>
>>>
>>> I'll do this this afternoon If I have some time...
>>>
>>>
>>>> Finally, if you do find you have to use the pragma, please write the
>>>>> fix as
>>>>>
>>>>> long __attribute__((optimize("O0")))
>>>>> thunkEntry(void *thunkp, sqIntptr_t *stackp)
>>>>>
>>>>> to keep the definition starting on a new line, which helps when using
>>>>> command-line tools to look for definitions outside of an ide.
>>>>>
>>>>
>>> Sure! no problem!
>>>
>>> Tx Again!
>>>
>>>
>>>>
>>>>>
>>>>>> From our side, we think that using a pragma to disable optimizations
>>>>>> for thunkEntry in the case of win32 looks okeyish at least to make the bug
>>>>>> go away.
>>>>>>
>>>>>
>>>>> Yes, but I expect it is actually that the volatile keyword has not
>>>>> been used (a mistake of mine).  Here's a relevant stack overflow answer:
>>>>>
>>>>> https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp
>>>>> <https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp>
>>>>>
>>>>> And if volatile does fix the issue, please apply it to the other
>>>>> thinkEntry implementations.
>>>>>
>>>>> Cheers,
>>>>>> Guille & Pablo
>>>>>>
>>>>>
>>>>> Cheers!
>>>>>
>>>>
>>>> _,,,^..^,,,_
>>>> best, Eliot
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>> Guille Polito
>>>
>>> Research Engineer
>>>
>>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>>
>>> CRIStAL - UMR 9189
>>>
>>> French National Center for Scientific Research - *http://www.cnrs.fr
>>> <http://www.cnrs.fr>*
>>>
>>>
>>> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>>>
>>> *Phone: *+33 06 52 70 66 13
>>>
>>
>
> --
>
>
>
> Guille Polito
>
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
>
> CRIStAL - UMR 9189
>
> French National Center for Scientific Research - *http://www.cnrs.fr
> <http://www.cnrs.fr>*
>
>
> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>
> *Phone: *+33 06 52 70 66 13
>
>

-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
<http://www.cnrs.fr>*


*Web:* *http://guillep.github.io* <http://guillep.github.io>

*Phone: *+33 06 52 70 66 13
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20190118/4681612d/attachment-0001.html>


More information about the Vm-dev mailing list