It seems that after ba7bb9557 a seqfault was introduced to the 64bit Linux VM. With no image file the VM happily produces output. When providing an image file it immediately segfaults regardless of whether a display is provided or whether it runs headless.
Hi Patrick,
On Oct 17, 2019, at 3:00 AM, Patrick R notifications@github.com wrote:
It seems that after ba7bb95 a seqfault was introduced to the 64bit Linux VM. With no image file the VM happily produces output. When providing an image file it immediately segfaults regardless of whether a display is provided or whether it runs headless.
can you provide some more information? The crash.dmp file perhaps (please generate a fresh one so it contains only one or two crashes). Can you also possible boy disassemble the trampoline table to see what code is generated for s stack switch? Any of the send trampolines will do. To do this, in gdb put s breakpoint in interpret, run and proceed once, then call printTrampolineTable(), and then disassemble a send routine using a pair of addresses from the output.
AdvThanksance
I'm trying to isolate the problem too. It might have something to do with my recent change to the dependencies, although I doubt it. I'll get back to you when I know more.
@eliotmiranda The commit that introduced the segfault is 30220afbafb33fd0af386c63c9a6b53b7e6bdac7.
Can confirm @theseion 's finding after a bisect. The crash does not occur on debug builds for me, unfortunately. This is the stack I get in a production build:
``` (gdb) bt #0 0x0000555555a0003d in ?? () #1 0x0000555555b5e000 in ?? () #2 0x000055555560a1b1 in generateStackPointerCapture () at /home/tom/Code/squeak/opensmalltalk-vm/spur64src/vm/cogitX64SysV.c:7255 #3 initializeCodeZoneFromupTo (startAddress=<optimized out>, endAddress=93824996104024) at /home/tom/Code/squeak/opensmalltalk-vm/spur64src/vm/cogitX64SysV.c:7903 #4 0x00005555555b8559 in readImageFromFileHeapSizeStartingAt (f=<optimized out>, desiredHeapSize=<optimized out>, imageOffset=<optimized out>) at /home/tom/Code/squeak/opensmalltalk-vm/spur64src/vm/gcc3x-cointerp.c:20588 #5 0x00005555555894c0 in imgInit () at /home/tom/Code/squeak/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:1971 #6 0x0000555555585c62 in main (argc=<optimized out>, argv=0x7fffffffd4a8, envp=<optimized out>) at /home/tom/Code/squeak/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:2095 (gdb) ```
Same here:
Program received signal SIGSEGV, Segmentation fault. 0x000000000090003a in ?? () (gdb) bt #0 0x000000000090003a in ?? () #1 0x0000000000a5e000 in ?? () #2 0x000000000049ce86 in generateStackPointerCapture () at /media/psf/Home/Smalltalk/OpenSmalltalk/opensmalltalk-vm/spur64src/vm/cogitX64SysV.c:7255 #3 initializeCodeZoneFromupTo (startAddress=<optimized out>, endAddress=7980952) at /media/psf/Home/Smalltalk/OpenSmalltalk/opensmalltalk-vm/spur64src/vm/cogitX64SysV.c:7903 #4 0x000000000044c69d in readImageFromFileHeapSizeStartingAt (f=f@entry=0x85df20, desiredHeapSize=desiredHeapSize@entry=0, imageOffset=imageOffset@entry=0) at /media/psf/Home/Smalltalk/OpenSmalltalk/opensmalltalk-vm/spur64src/vm/gcc3x-cointerp.c:20588 #5 0x000000000041e142 in imgInit () at /media/psf/Home/Smalltalk/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:1971 #6 0x000000000041aee1 in main (argc=<optimized out>, argv=0x7fffffffdd88, envp=<optimized out>) at /media/psf/Home/Smalltalk/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:2095
(gdb) call printTrampolineTable() 0x900000: ceCheckLZCNTFunction 0x900018: ceGetFP 0x900020: ceGetSP 0x900028: ceCaptureCStackPointers
(gdb) x/4i 0x900018 0x900018: mov %rbp,%rax 0x90001b: retq 0x90001c: int3 0x90001d: int3 (gdb) x/4i 0x900020 0x900020: mov %rsp,%rax 0x900023: add $0x8,%rax 0x900027: retq 0x900028: push %rbx (gdb) x/11i 0x900028 0x900028: push %rbx 0x900029: mov $0x79c798,%rbx 0x900030: movabs %rax,0x79c420 => 0x90003a: add %al,(%rax) 0x90003c: add %cl,-0x77(%rax) 0x90003f: loopne 0x900089 0x900041: add $0x10,%eax 0x900044: movabs %rax,0x79c418 0x90004e: pop %rbx 0x90004f: retq 0x900050: int3
It looks like the generated method got corrupted again...
However, if I use `build.lix64x64/squeak.cog.spur/build_clang/mvm`, I then get a working VM. I suggest using clang until we find the problem...
May I close this one? I got literally NO feedback...
Sorry about that, I had interpreted your comment as a suggestion. Did you change the build process to use Clang instead of GCC? In that case I'll have to check again. I'll do that tonight.
Yes, see merged PR above. It's only a workaround, not a root cause analysis + solving. But since we have similar workaround already for Win64, we can live with it for a while... In the mid term, we HAVE TO gain deeper understanding because changing the compiler is hardly a sustainable solution (and it's one shot)... So we might want to keep this issue open, or create another one with different milestones (if we start using milestones...)
I have to postpone this until at least tomorrow evening. Sorry.
Yes, using Clang fixes the segfault. Thanks Nicolas!
If you already have such a workaround for Windows, I guess it would be a good idea to have a separate issue for reverting back to GCC (or whatever the plan might be then). You can go ahead and close this issue.
So I close the issue since the workaround works
Closed #433.
vm-dev@lists.squeakfoundation.org