[Vm-dev] OSVM | Strange segfaults in macos64ARMv8 -f builds ...

Marcel Taeumel marcel.taeumel at hpi.de
Tue Apr 5 13:17:57 UTC 2022

Okay. A full re-compilation of macos64ARMv8 seems to do the trick. Maybe my experiments did the partial re-compilation wrong?

Here is what I did after changing some code:

rm -r Squeak.app
./mvm -f

But after this, it seems to work without segfaults:

make cleanall
./mvm -f

Is this expected? Anyway. I am now working with that VM for over an hour and there are no segfaults. Yay!


Am 05.04.2022 09:02:12 schrieb Marcel Taeumel <marcel.taeumel at hpi.de>:
Hi Eliot --

> No; I see no such issues in my use of the M1 vm.

This begs the question ... did you try to use PR 620 for some minutes?

I will take another look but I am almost certain that this issue is not caused by PR 620 but just triggers an issue present in the ARMv8 JIT. Hmm...

Am 05.04.2022 00:26:18 schrieb Eliot Miranda <eliot.miranda at gmail.com>:

On Apr 4, 2022, at 12:05 AM, Marcel Taeumel <marcel.taeumel at hpi.de> wrote:

Hi Eliot --

Thanks for the report! :-) I assume that you were able to reproduce the segfaults.

No; I see no such issues in my use of the M1 vm.

> Why x8 is getting changed I don't know.  More investigation is needed :-)

Indeed. I am now (kind of) certain that those segfaults do not occur in the macos64x64 build.Thus, I think that PR 620 is safe to merge.
https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/620 [https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/620] 

Let's assume that those changes, which enqueue display events on a regular basis, are somehow related to the more frequent segfaults. Using the -metal backend, the place where those events are likely to be scheduled is here:

platforms/iOS/vm/OSX/sqSqueakOSXMetalView.m [https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/620/files#diff-3508089726a0912d666567b311fcb68d8d83be6ee5f94395830c2557b2e27cb6]

249: [self setNeedsDisplayInRect: [self frame]];

Then, the place where that event gets processed frequently is here:

platforms/iOS/vm/Common/Classes/sqSqueakEventsAPI.m [https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/620/files#diff-49a3a3e74d7f53b7b188157183daf83ca041c5c3cbec2ef2a4a21746440bb962]

81: [gDelegateApp.squeakApplication pumpRunLoopEventSendAndSignal:NO];

platforms/iOS/vm/OSX/sqSqueakOSXApplication+events.m [https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/54519ffe59d4b0e5dfccd5684f8a4af505325dbf/platforms/iOS/vm/OSX/sqSqueakOSXApplication%2Bevents.m#L147]

147: [NSApp sendEvent: event];

It may be useful to monitor the integrity of x8 while display events are being handled.


Am 02.04.2022 04:34:35 schrieb Eliot Miranda <eliot.miranda at gmail.com>:
Hi Marcel,

On Fri, Apr 1, 2022 at 2:37 AM Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]> wrote:

Hi Eliot --

The segfaults might be related to #checkForEventsMayContextSwitch:. Currently, it directly pumps all events, which now include some display events. But before that, it tries to go some compacted-code path:

checkForEventsMayContextSwitch: bool

   self checkCogCompiledCodeCompactionCalledFor.

   self ioProcessEvents.

Now, #ioProcessEvents may end up reading the displayBits. Yes, those are pinned ... and "commenceCogCompiledCodeCompaction" seems to always show up at the C stack backtrace ...

The last thing in the crash dump is **CompactCode**, which shows that the VM is in code compaction.  It just so happens that code compaction is almost always invoked from checkForEventsMayContextSwitch:, so that isn't interesting.  Why the VM is crashing in commenceCogCompiledCodeCompaction is the question. Looking at the function in lldb it seems that the crash is on return from compactCogCompiledCode.  The stack trace points us to compactCogCompiledCode + 204:

2   squeak                              0x0000000100a0cd1c sigsegv + 240
3   libsystem_platform.dylib            0x0000000195d82c44 _sigtramp + 56
4   squeak                              0x00000001009bfb04 commenceCogCompiledCodeCompaction + 204
5   squeak                              0x000000010098a7e8 checkForEventsMayContextSwitch + 104
6   squeak                              0x000000010098ed24 ceStackOverflow + 136

disassembling in lldb we see:

(lldb) dis -n commenceCogCompiledCodeCompaction

Squeak[0x10004caa0] <+196>: str    x9, [x8, #0x8]
Squeak[0x10004caa4] <+200>: bl     0x100056e7c               ; compactCogCompiledCode at cogitARMv8.c:10187
Squeak[0x10004caa8] <+204>: adrp   x8, 251
Squeak[0x10004caac] <+208>: str    xzr, [x8, #0xb00]
Squeak[0x10004cab0] <+212>: nop
Squeak[0x10004cab4] <+216>: str    xzr, [x8, #0xaf8]
Squeak[0x10004cab8] <+220>: ldr    x8, [x20, #0xdc8]


So it looks like the crash is actually the store
Squeak[0x10004caac] <+208>: str    xzr, [x8, #0xb00]

and perhaps x8 has been corrupted.  It has the value 6, which isn't going to work for the store.  Why x8 is getting changed I don't know.  More investigation is needed :-)

Am 31.03.2022 18:23:26 schrieb Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]>:
Hi Eliot --

I am trying to understand some sporadic segfaults while fixing some things around -core-graphics and -metal backends: https://github.com/marceltaeumel/opensmalltalk-vm/tree/marceltaeumel/high-resolution-fix [https://github.com/marceltaeumel/opensmalltalk-vm/tree/marceltaeumel/high-resolution-fix]

The current changes for -core-graphics and -metal backends in this branch put more "pressure" on the applications event loop. In that area, I could tweak vmIOProcessEvents a little bit to get less segfaults. I think it was some old Carbon-related code. However, the sporadic segfaults remain. At one point, I could just try to open a new MVC or Morphic project and the VM would segfault. :-/ Sometimes it takes a little bit longer.

Please find attached an exemplary crash report.

I have not yet managed to reproduce that segfault in an X86_64 builds (i.e., macos64x64) ... Am I doing this or is there some bug in the ARM JIT? :-/

If you try out this branch, make sure to have a recent Squeak Trunk and do-it:
"WorldState disableDeferredUpdates: true"

Otherwise you will see flickering and not see dragging within Morphic.




best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220405/18cabafa/attachment.html>

More information about the Vm-dev mailing list